Using Spring batch to read a file and write to a Map

Question

Background

I am fairly new to Spring batch and have the following requirement :

Read a file with a minumum of million records (CSV, pipe delimited etc)
Load each row in the file into a Map with key as the first column and value as a domain object/POJO.

I understand that Spring batch has something known as chunk oriented processing where one configures a reader, processor and a writer to process a certain number of records governed by the commit-interval. This can further be scaled using a task executor for the reader or by adding another layer of multithreading through partitioning.

Question

As explained in point 2 above, I want to load my file into a Map. For the sake of discussion, lets say I implement the following ItemWriter that aggregates the chunks into a Map.

public class MapItemWriter implements ItemWriter {

    private Map somePojoMap; 

    public MapItemWriter() {
        System.out.println("Writer created ");
        somePojoMap= new ConcurrentHashMap();
    }

    public void write(List item) throws Exception {
        if (item != null && item.size() > 0) {
            for (SomePOJO data : item) {
                String uniqueId = data.Id();
                somePojoMap.put(uniqueId, data);
             }
        }
    }

    public Map getSomePojoMap() {
        return somePojoMap;
    }
}

Since I have access to my ItemWriter bean, I can later call getSomePojoMap to get the aggregated Map of records in my file; however, holding a Map like this in the ItemWriter doesn't feel like the best way to go about this. Another concern is that the use of a ConcurrentHashMap may degrade performance but I don't see any other way in which I can aggregate the file into a Map in a thread safe manner.

Is there a better way to aggregate my file into a Map rather than holding a Map in my writer and using a ConcurrentHashMap?

Artefacto · Accepted Answer

That's more or less it. You could make small improvements like putting the map in a separate bean, which would allow you to have a different lifetime for the writer bean and the map and also decouple the readers of the map from the writer. For instance you could put the map in a job scoped bean and still have the writer a singleton, for instance.

You only need a ConcurrentHashMap if your job is partitioned into multiple threads (I'm assuming you don't want the map shared across jobs).

Using Spring batch to read a file and write to a Map

Answers (2)

Related Questions