StackCanary
StackCanary

Reputation: 117

Spring Batch: Processing multiple files with different structures

I have a use case that I'm not sure if can be solved the way I want with Spring batch.

Use case

  1. Read 2 types of XML files with different structures, from a directory (can be multiple files of both types), into 2 types of object
  2. Process these objects
  3. Write a new flat file (.txt) which serves as a report, using data from both files/objects

Issue

If I understand the design correctly, ItemReaders read one kind of object, ItemProcessors take one kind of object and return another, whilst ItemWriters write one kind of object. As far as I know, there is no way to do chunk-processing on multiple files with different structures. Or rather, have two readers, one processor and one writer.

Any suggestions on how this can be solved in a good way?

I believe this kind of processing can be achieved using Tasklets, but I find the code often gets a bit messy with holding data between steps using the context etc.

This is a WIP on one of the readers I've made for one of the types ("Car" is just to make the example more readable)

@Bean
fun multiResourceItemReader(): ItemReader<CarData> {
    val patternResolver: ResourcePatternResolver = PathMatchingResourcePatternResolver()
    val resources: Array<Resource> = patternResolver.getResources("/some/directory")
    val reader: MultiResourceItemReader<CarData> = MultiResourceItemReader()
    reader.setResources(resources)
    reader.setDelegate(carItemReader())
    return reader
}

@Bean
fun carItemReader(): StaxEventItemReader<CarData> =
    StaxEventItemReaderBuilder<CarData>()
        .name("CarItemReader")
        .addFragmentRootElements("Car")
        .unmarshaller(carDataMarshaller())
        .build()

@Bean
fun carDataMarshaller(): XStreamMarshaller {
    val aliases: MutableMap<String, Class<*>> = HashMap()
    aliases["CarDetails"] = CarDetailType::class.java
    aliases["carProp1"] = Int::class.java
    aliases["carProp2"] = Int::class.java
    aliases["carProp3"] = Int::class.java
    aliases["carProp4"] = Int::class.java
    aliases["carProp5"] = Int::class.java

    val marshaller: XStreamMarshaller = XStreamMarshaller()
    marshaller.setAliases(aliases)
    return marshaller
}

Now a Step definition would typically look something like this for a single reader, but I haven't gotten this far as I'm pondering about how to implement the use case at all:

    stepBuilderFactory.get("step1").chunk(5)
        .reader(multiResourceItemReader())
        .writer(someWriter())
        .build();

Upvotes: 1

Views: 1785

Answers (1)

ACH
ACH

Reputation: 195

Since multiple readers is not an option, this trick could tackle this issue:

Implement a pre-process step that merges the 2 XML files, each file content under dedidated root node, rootNodeA and rootNodeB

Encapsulate the 2 XML classes in a wrapper class:

    @XmlRootElement(name = "root")
    @XmlAccessorType(XmlAccessType.FIELD)
    public class AB {
    
        @XmlElement(name = "rootNodeA")
        private A a = new A();
    
        @XmlElement(name = "rootNodeB")
        private B b = new B();

        //Getters & Setters
    }

Then AB can easily be read and processed in a classic way

NB: It is also possible to do the pre-process in a beforeStep stepExecutionListener, and delete the merged file in afterStep if disk space is a potential issue

Upvotes: 1

Related Questions