Thursday, February 16, 2012

Spring Batch Tutorial (Part 3)

Review

In the previous section, we have written and discussed the Spring Batch-related classes. In this section, we will write and declare the Spring Batch-related configuration files.


Configuration

Properties File

The spring.properties file contains the database name and CSV files that we will import. A job.commit.interval property is also specified which denotes how many records to commit per interval.



Spring Batch

To configure a Spring Batch job, we have to declare the infrastructure-related beans first. Here are the beans that needs to be declared:

  • Declare a job launcher
  • Declare a task executor to run jobs asynchronously
  • Declare a job repository for persisting job status

What is Spring Batch?

Spring Batch is a lightweight, comprehensive batch framework designed to enable the development of robust batch applications vital for the daily operations of enterprise systems. Spring Batch builds upon the productivity, POJO-based development approach, and general ease of use capabilities people have come to know from the Spring Framework, while making it easy for developers to access and leverage more advance enterprise services when necessary. Spring Batch is not a scheduling framework.

Source: Spring Batch Reference Documentation

What is a JobRepository?

JobRepository is the persistence mechanism for all of the Stereotypes mentioned above. It provides CRUD operations for JobLauncher, Job, and Step implementations.

Source: Spring Batch - Chapter 3. The Domain Language of Batch

What is a JobLauncher?

JobLauncher represents a simple interface for launching a Job with a given set of JobParameters

Source: Spring Batch - Chapter 3. The Domain Language of Batch

Here's our main configuration file:



Notice we've also declared the following beans:
  • Declare a JDBC template
  • User and Role ItemWriters

Job Anatomy

Before we start writing our jobs, let's examine first what constitutes a job.

What is a Job?

A Job is an entity that encapsulates an entire batch process. As is common with other Spring projects, a Job will be wired together via an XML configuration file

Source: Spring Batch: The Domain Language of Batch: Job

Each job contains a series of steps. For each of step, a reference to an ItemReader and an ItemWriter is also included. The reader's purpose is to read records for further processing, while the writer's purpose is to write the records (possibly in a different format).

What is a Step?

A Step is a domain object that encapsulates an independent, sequential phase of a batch job. Therefore, every Job is composed entirely of one or more steps. A Step contains all of the information necessary to define and control the actual batch processing.

Source: Spring Batch: The Domain Language of Batch: Step

Each reader typically contains the following properties
  • resource - the location of the file to be imported
  • lineMapper - the mapper to be used for mapping each line of record
  • lineTokenizer - the type of tokenizer
  • fieldSetMapper - the mapper to be used for mapping each resulting token

What is an ItemReader?

Although a simple concept, an ItemReader is the means for providing data from many different types of input. The most general examples include: Flat File, XML, Database

Source: Spring Batch: ItemReaders and ItemWriters

What is an ItemWriter?

ItemWriter is similar in functionality to an ItemReader, but with inverse operations. Resources still need to be located, opened and closed but they differ in that an ItemWriter writes out, rather than reading in.

Source: Spring Batch: ItemReaders and ItemWriters

The Jobs

As discussed in part 1, we have three jobs.

Job 1: Comma-delimited records

This job contains two steps:
  1. userLoad1 - reads user1.csv and writes the records to the database
  2. roleLoad1 - reads role1.csv and writes the records to the database
Notice userLoad1 is using DelimitedLineTokenizer and the properties to be matched are the following: username, firstName, lastName, password. Whereas, roleLoad1 is using the same tokenizer but the properties to be matched are the following: username and role.

Both steps are using their own respective FieldSetMapper: UserFieldSetMapper and RoleFieldSetMapper.

What is DelimitedLineTokenizer?

Used for files where fields in a record are separated by a delimiter. The most common delimiter is a comma, but pipes or semicolons are often used as well.

Source: Spring Batch: ItemReaders and ItemWriters


Job 2: Fixed-length records

This job contains two steps:
  1. userLoad2 - reads user2.csv and writes the records to the database
  2. roleLoad2 - reads role2.csv and writes the records to the database

Notice userLoad2 is using FixedLengthTokenizer and the properties to be matched are the following: username, firstName, lastName, password. However, instead of matching them based on a delimiter, each token is matched based on a specified length: 1-5, 6-9, 10-16, 17-25 where 1-5 represents the username and so forth. The same idea applies to roleLoad2.

What is FixedLengthTokenizer?

Used for files where fields in a record are each a 'fixed width'. The width of each field must be defined for each record type.

Source: Spring Batch: ItemReaders and ItemWriters


Job 3: Mixed records

This job contains two steps:
  1. userLoad3 - reads user3.csv and writes the records to the database
  2. roleLoad3 - reads role3.csv and writes the records to the database

Job 3 is a mixed of Job 1 and Job 2. In order to mix both, we have to set our lineMapper to PatternMatchingCompositeLineMapper.

What is PatternMatchingCompositeLineMapper?

Determines which among a list of LineTokenizers should be used on a particular line by checking against a pattern.

Source: Spring Batch: ItemReaders and ItemWriters

For the FieldSetMapper, we are using a custom implementation MultiUserFieldSetMapper which removes a semicolon from the String. See Part 2 for the class declaration.



Next

In the next section, we will run the application using Maven. Click here to proceed.
StumpleUpon DiggIt! Del.icio.us Blinklist Yahoo Furl Technorati Simpy Spurl Reddit Google I'm reading: Spring Batch Tutorial (Part 3) ~ Twitter FaceBook

Subscribe by reader Subscribe by email Share

No comments:

Post a Comment