Alex
Alex

Reputation: 151

Spring batch read multiline once in the reader using AggregateItemReader or other solution

I use Spring Batch to process a file with 3 million lines of data. The file is structured as follows:

ID1-Adress1-NumberPhone1
ID1-Adress2-NumberPhone2
ID1-Adress3-NumberPhone3
ID2-Adress1-NumberPhone1
ID2-Adress2-NumberPhone2
ID3-Adress1 NumberPhone1
...

I need to read the file by ID, not line-by-line. For example:
read

ID1-Adress2-NumberPhone2
ID1-Adress3-NumberPhone3
ID2-Adress1-NumberPhone1
ID2-Adress2-NumberPhone2

then create a Person object having as attributes the ID and a Map <String, String> (for adress, numberPhone), pass this object to the processor, then read the lines associated with the second ID, and so on until I have a List of Person objects to give to the writer. Specifically, I need my reader to complete each multi-line object before it is sent to the processor and before the writer stores it in the database.

The process I followed I created a step that reads line by line then passes this line as an object to the writer, in this writer I loop on those objects having the same ID and I map them (write) in another object which is my complete object, then a second step that takes this final list of complete objects reads it and inserts it in the database, the problem is that this step takes a lot of time more than 2H for the 1st step

is there a method, an aggregation process in the reader? I tried to see this example https://github.com/spring-projects/spring-batch/tree/master/spring-batch-samples/src/main/java/org/springframework/batch/sample/domain/ multiline but I did not understand at all I need a simple concrete example by adapting to the format of the file cited above

Upvotes: 4

Views: 5448

Answers (1)

Mahmoud Ben Hassine
Mahmoud Ben Hassine

Reputation: 31600

You can take a look at the multiline sample. In this example, the input file has the following format (which is similar to your case):

BEGIN
INFO,UK21341EAH45,customer1
AMNT,978,98.34
END
BEGIN
INFO,UK21341EAH46,customer2
AMNT,112,18.12
END
...

A custom reader is used to aggregate items that span multiple lines. In this example, a (logical) item is delimited by BEGIN and END (physical) records. You can take a look at the MultilineTradeItemReader and adapt it to your case.

Hope this helps.

Upvotes: 2

Related Questions