Spring batch file reader record with different delimiters within a record

Question

I have below sample input file where elements can be of any size after 3rd "|" delimiter . A person can have any number of addresses separated by "," and each address element is separated by ":" delimiter. Can you please advise if there is any file reader that can handle this kind of data record? Thanks

id1|name1|male|1:new york:NY:10019, 2:philadelphia:PA:19382, 3:columbus:OH:23415|USA
id2|name2|female|1:new york:NY:10019, 2:philadelphia:PA:19382, 3:columbus:OH:23415, 4:west chester:PA:19341|USA
id3|name3|male|1:new york:NY:10019|USA
id4|name4|female|1:new york:NY:10019, 2:philadelphia:PA:19382|USA

Mahmoud Ben Hassine · Accepted Answer

This is a custom requirement and there is no built-in way to do that in Spring Batch. You can however leverage the FlatFileItemReader with a custom FieldSetMapper. Here is a quick example:

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.batch.item.file.FlatFileItemReader;
import org.springframework.batch.item.file.builder.FlatFileItemReaderBuilder;
import org.springframework.batch.item.file.mapping.DefaultLineMapper;
import org.springframework.batch.item.file.mapping.FieldSetMapper;
import org.springframework.batch.item.file.transform.DelimitedLineTokenizer;
import org.springframework.batch.item.file.transform.FieldSet;
import org.springframework.context.ApplicationContext;
import org.springframework.context.annotation.AnnotationConfigApplicationContext;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.FileSystemResource;
import org.springframework.validation.BindException;

@Configuration
@EnableBatchProcessing
public class MyJob {

    @Bean
    public FlatFileItemReader itemReader() {
        DefaultLineMapper lineMapper =new DefaultLineMapper<>();
        lineMapper.setLineTokenizer(new DelimitedLineTokenizer("|"));
        lineMapper.setFieldSetMapper(new PersonMapper());
        return new FlatFileItemReaderBuilder()
                .name("personItemReader")
                .resource(new FileSystemResource("persons.csv"))
                .lineMapper(lineMapper)
                .build();
    }

    @Bean
    public Job job(JobBuilderFactory jobs, StepBuilderFactory steps) {
        return jobs.get("job")
                .start(steps.get("step")
                        .chunk(2)
                        .reader(itemReader())
                        .writer(items -> items.forEach(System.out::println))
                        .build())
                .build();
    }

    public static void main(String[] args) throws Exception {
        ApplicationContext context = new AnnotationConfigApplicationContext(MyJob.class);
        JobLauncher jobLauncher = context.getBean(JobLauncher.class);
        Job job = context.getBean(Job.class);
        jobLauncher.run(job, new JobParameters());
    }

    static class Person {
        String id, name, gender, country;
        List addresses = new ArrayList<>(); // TODO create and use Address class instead of String

        @Override
        public String toString() {
            return "Person{" +
                    "id='" + id + '\'' +
                    ", name='" + name + '\'' +
                    ", gender='" + gender + '\'' +
                    ", country='" + country + '\'' +
                    ", addresses=" + addresses +
                    '}';
        }
    }

    static class PersonMapper implements FieldSetMapper {
        @Override
        public Person mapFieldSet(FieldSet fieldSet) throws BindException {
            Person p = new Person();
            p.id = fieldSet.readString(0);
            p.name = fieldSet.readString(1);
            p.gender = fieldSet.readString(2);
            p.addresses.addAll(Arrays.asList(fieldSet.readString(3).split(","))); // TODO split address as needed
            p.country = fieldSet.readString(4);
            return p;
        }
    }

}

With your file as input, this prints:

Person{id='id1', name='name1', gender='male', country='USA', addresses=[1:new york:NY:10019,  2:philadelphia:PA:19382,  3:columbus:OH:23415]}
Person{id='id2', name='name2', gender='female', country='USA', addresses=[1:new york:NY:10019,  2:philadelphia:PA:19382,  3:columbus:OH:23415,  4:west chester:PA:19341]}
Person{id='id3', name='name3', gender='male', country='USA', addresses=[1:new york:NY:10019]}
Person{id='id4', name='name4', gender='female', country='USA', addresses=[1:new york:NY:10019,  2:philadelphia:PA:19382]}

As mentioned in the code comment, it is up to you now to create a domain class for addresses and parse the 4th field as needed.

Spring batch file reader record with different delimiters within a record

Answers (1)

Related Questions