Sada Shiv Dash
Sada Shiv Dash

Reputation: 9

Merge Multiple csv files into Single csv using Spring batch

I have a business case of Merge Multiple csv files(around 1000+ Each containing 1000 records )into Single csv using Spring batch .

Please help me provide your guidance and solutions in terms of approach and performance-wise as well.

So far, I have tried two approaches,

Approach 1.

Tasklet chunk with multiResourceItemReader to read the files from directory and FlatFileItemWriter as item writer.

Issue here is, it is very slow in processing since this is single threaded, but approach works as expected.

Approach 2: Using MultiResourcePartitioner partitioner and AsynTaskExceutor as task-executor.

Issue here is, since it is async multi-thread, data is getting overwritten/ corrupted while merging into final single file.

Upvotes: 1

Views: 2467

Answers (2)

Sabir Khan
Sabir Khan

Reputation: 10142

Since your headers are common between your source and destination files, I wouldn't recommend using Spring Batch provided readers to convert lines into specific beans since column level information is not needed & csv being a text format , you can go ahead only with line level info without breaking it at field level.

Also, partitioning per file is going to be a very slow ( if you have those many files ) & you should try by first fixing your number of partitions ( like 10 or 20 ) and try grouping your files into those many partitions. Secondly file writing being a disk based operation & not CPU based, multi threading won't be useful.

What I suggest instead is to write your custom reader & writer in plain Java on the lines as suggested in this answer where your reader will return a List<String> and writer will get List<List<String>> & that you can write to file.

If you have enough memory to hold lines from all files in one go then you can read all files in one go & keep returning chunk_size or you can keep reading small set of files to reach chunk size limit should be good enough. Your reader will return null when no more files to read.

Upvotes: 0

tausif
tausif

Reputation: 151

You can wrap your FlatFileItemWriter in AsyncItemWriter and use along with AsyncItemProcessor. This will not corrupt your data and increase the performance as processing and writing will be through several threads.

@Bean
    public AsyncItemWriter asyncItemWriter() throws Exception {
        AsyncItemWriter<Customer> asyncItemWriter = new AsyncItemWriter<>();

        asyncItemWriter.setDelegate(flatFileItemWriter);
        asyncItemWriter.afterPropertiesSet();

        return asyncItemWriter;
    }

@Bean
    public AsyncItemProcessor asyncItemProcessor() throws Exception {
        AsyncItemProcessor<Customer, Customer> asyncItemProcessor = new AsyncItemProcessor();

        asyncItemProcessor.setDelegate(itemProcessor());
        asyncItemProcessor.setTaskExecutor(threadPoolTaskExecutor());
        asyncItemProcessor.afterPropertiesSet();

        return asyncItemProcessor;
    }

@Bean
    public TaskExecutor threadPoolTaskExecutor() {

        ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
        executor.setCorePoolSize(10);
        executor.setMaxPoolSize(10);
        executor.setThreadNamePrefix("default_task_executor_thread");
        executor.initialize();
        return executor;

    }

Upvotes: 0

Related Questions