Reputation: 117
I have a batch Job to read the records From SQLServer and Write into MariaDB.Even though i have implemented the concept of partition in the batch process , the process is very slow
Below is the Datasource Configuration for source and target systems.
@Bean(name = "sourceSqlServerDataSource")
public DataSource mysqlDataSource() {
HikariDataSource hikariDataSource = new HikariDataSource();
hikariDataSource.setMaximumPoolSize(100);
hikariDataSource.setUsername(username);
hikariDataSource.setPassword(password);
hikariDataSource.setJdbcUrl(jdbcUrl);
hikariDataSource.setDriverClassName(driverClassName);
hikariDataSource.setPoolName("Source-SQL-Server");
return hikariDataSource;
}
@Bean(name = "targetMySqlDataSource")
@Primary
public DataSource mysqlDataSource() {
HikariDataSource hikariDataSource = new HikariDataSource();
hikariDataSource.setMaximumPoolSize(100);
hikariDataSource.setUsername(username);
hikariDataSource.setPassword(password);
hikariDataSource.setJdbcUrl(jdbcUrl);
hikariDataSource.setDriverClassName(driverClassName);
hikariDataSource.setPoolName("Target-Myql-Server");
return hikariDataSource;
}
Below is the My Bean configured and thread pool taskexecutor
@Bean(name = "myBatchJobsThreadPollTaskExecutor")
public ThreadPoolTaskExecutor initializeThreadPoolTaskExecutor() {
ThreadPoolTaskExecutor threadPoolTaskExecutor = new ThreadPoolTaskExecutor();
threadPoolTaskExecutor.setCorePoolSize(100);
threadPoolTaskExecutor.setMaxPoolSize(200);
threadPoolTaskExecutor.setThreadNamePrefix("My-Batch-Jobs-TaskExecutor ");
threadPoolTaskExecutor.setWaitForTasksToCompleteOnShutdown(Boolean.TRUE);
threadPoolTaskExecutor.initialize();
log.info("Thread Pool Initialized with min {} and Max {} Pool Size",threadPoolTaskExecutor.getCorePoolSize(),threadPoolTaskExecutor.getMaxPoolSize() );
return threadPoolTaskExecutor;
}
Here are the step and partition step configured
@Bean(name = "myMainStep")
public Step myMainStep() throws Exception{
return stepBuilderFactory.get("myMainStep").chunk(500)
.reader(myJdbcReader(null,null))
.writer(myJpaWriter()).listener(chunkListener)
.build();
}
@Bean
public Step myPartitionStep() throws Exception {
return stepBuilderFactory.get("myPartitionStep").listener(myStepListener)
.partitioner(myMainStep()).partitioner("myPartition",myPartition)
.gridSize(50).taskExecutor(asyncTaskExecutor).build();
}
Updating the post with reader and writer
@Bean(name = "myJdbcReader")
@StepScope
public JdbcPagingItemReader myJdbcReader(@Value("#{stepExecutionContext[parameter1]}") Integer parameter1, @Value("#{stepExecutionContext[parameter2]}") Integer parameter2) throws Exception{
JdbcPagingItemReader jdbcPagingItemReader = new JdbcPagingItemReader();
jdbcPagingItemReader.setDataSource(myTargetDataSource);
jdbcPagingItemReader.setPageSize(500);
jdbcPagingItemReader.setRowMapper(myRowMapper());
Map<String,Object> paramaterMap=new HashMap<>();
paramaterMap.put("parameter1",parameter1);
paramaterMap.put("parameter2",parameter2);
jdbcPagingItemReader.setQueryProvider(myQueryProvider());
jdbcPagingItemReader.setParameterValues(paramaterMap);
return jdbcPagingItemReader;
}
@Bean(name = "myJpaWriter")
public ItemWriter myJpaWriter(){
JpaItemWriter<MyTargetTable> targetJpaWriter = new JpaItemWriter<>();
targetJpaWriter.setEntityManagerFactory(localContainerEntityManagerFactoryBean.getObject());
return targetJpaWriter;
}
Can some one throw light on how to increase the performance of read write using Spring batch...?
Upvotes: 4
Views: 9737
Reputation: 31730
Improving the performance of such an application depends on multiple parameters (grid size, chunk size, page size, thread pool size, db connection pool size, latency between db servers and your JVM, etc). So I can't give you a precise answer to your question but I will try to provide some guide lines:
.taskExecutor(asyncTaskExecutor)
and not a SimpleAsyncTaskExecutor
which does not reuse threads.There are a lot of other tricks like estimating an item size in memory and make sure the total chunk size in memory is < heap size to avoid unnecessary GC within a chunk, choosing the right GC algorithm for batch apps, etc but those are somehow advanced. The list of guide lines above is a good starting point IMO.
Hope this helps!
Upvotes: 11