Reputation: 151
I use Spring Batch to process a file with 3 million lines of data. The file is structured as follows:
ID1-Adress1-NumberPhone1
ID1-Adress2-NumberPhone2
ID1-Adress3-NumberPhone3
ID2-Adress1-NumberPhone1
ID2-Adress2-NumberPhone2
ID3-Adress1 NumberPhone1
...
I need to read the file by ID,
not line-by-line.
For example:
read
ID1-Adress2-NumberPhone2
ID1-Adress3-NumberPhone3
ID2-Adress1-NumberPhone1
ID2-Adress2-NumberPhone2
then create a Person object having as attributes the ID and a
Map <String, String>
(for adress, numberPhone),
pass this object to the processor,
then read the lines associated with the second ID,
and so on until I have a List of Person objects to give to the writer.
Specifically,
I need my reader to complete each multi-line object before it is sent to the processor and before the writer stores it in the database.
The process I followed I created a step that reads line by line then passes this line as an object to the writer, in this writer I loop on those objects having the same ID and I map them (write) in another object which is my complete object, then a second step that takes this final list of complete objects reads it and inserts it in the database, the problem is that this step takes a lot of time more than 2H for the 1st step
is there a method, an aggregation process in the reader? I tried to see this example https://github.com/spring-projects/spring-batch/tree/master/spring-batch-samples/src/main/java/org/springframework/batch/sample/domain/ multiline but I did not understand at all I need a simple concrete example by adapting to the format of the file cited above
Upvotes: 4
Views: 5448
Reputation: 31600
You can take a look at the multiline sample. In this example, the input file has the following format (which is similar to your case):
BEGIN
INFO,UK21341EAH45,customer1
AMNT,978,98.34
END
BEGIN
INFO,UK21341EAH46,customer2
AMNT,112,18.12
END
...
A custom reader is used to aggregate items that span multiple lines. In this example, a (logical) item is delimited by BEGIN
and END
(physical) records. You can take a look at the MultilineTradeItemReader and adapt it to your case.
Hope this helps.
Upvotes: 2