CuriousToLearn
CuriousToLearn

Reputation: 153

Parsing a Fixed length Flat xml file in spring batch

My XML file looks like below,

<?xml version="1.0" encoding="UTF-8"?>
<File fileId="123" xmlns="abc:XYZ" > ABC123411/10/20
XBC128911/10/20
BCD456711/23/22
</File>

This is a fixed length flat xml file, and I need to parse this file as For ex,

ABC123411/10/20

as create Content object.

public class Content {
   private id;
   private name;
   private date;
 
   // getters
}

Ex:

name: ABC
id: 1234
Date: 11/10/20

This is what I'm trying

<bean id="reader" class="org.springframework.batch.item.xml.StaxEventItemReader" scope="step">
    <property name="resource" value="file:#{jobExecutionContext['source.download.filePath']}" />
    <property name="unmarshaller" ref="jaxb2Marshaller" />
    <property name="fragmentRootElementNames"  value="File">
    </property>
</bean>

<bean id="jaxb2Marshaller" class="org.springframework.oxm.jaxb.Jaxb2Marshaller">
    <property name="packagesToScan">
        <list>
            <value>com.test.model</value>
        </list>
    </property>
</bean>

and my pojo,

@XmlAccessorType(XmlAccessType.FIELD)
@XmlRootElement(name = "File", namespace = "//namespace")
public class TestRecord {

   @XmlValue
   private String data;

   public String getData() {
     return data;
}

}

Now this code parses the xml file and sets the value as String in TestRecord.data as below

ABC123411/10/20
XBC128911/10/20
BCD456711/23/22

With this method, we need to write a mapper again to parse this string (from TestRecord.data) by new line and then tokenize each String and assign to Content object.

I just want to check if this is something we can do it in XML configuration using readers available or any other better options? thanks!

Upvotes: 0

Views: 461

Answers (2)

Mahmoud Ben Hassine
Mahmoud Ben Hassine

Reputation: 31620

I would keep it simple and create a tasklet that transforms this:

<?xml version="1.0" encoding="UTF-8"?>
<File fileId="123" xmlns="abc:XYZ" > ABC123411/10/20
XBC128911/10/20
BCD456711/23/22
</File>

into this:

ABC123411/10/20
XBC128911/10/20
BCD456711/23/22

and then create a chunk-oriented step with a FlatFileItemReader to parse the new file. This would be simpler than trying to find a way to ignore lines, use regular expressions to parse the content, etc.

Upvotes: 1

pete_bc
pete_bc

Reputation: 76

I successfully extracted the contents using RegexLineTokenizer instead of FixedLengthTokenizer setting strict to false prevents it from choking on lines that do not match the pattern, but it will create objects with empty properties for them.

   @Bean
   public static RegexLineTokenizer regexpTokenizer() {
     RegexLineTokenizer tok = new RegexLineTokenizer();
     tok.setRegex("([A-Za-z]{3})(\\d{4})(\\d{2}/\\d{2}/\\d{2})");
     tok.setNames("name","id","date" );
     tok.setStrict(false);
     return tok;
   }

Here is what that translates to as an XML configuration:

<bean id="reader" class="org.springframework.batch.item.file.FlatFileItemReader" scope="step">
<property name="resource" value="/file path" />
<property name="linesToSkip" value="2" />
<property name="lineMapper">
    <bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
        <property name="lineTokenizer">
            <bean class="org.springframework.batch.item.file.transform.RegexLineTokenizer">
                <property name="names"
                          value="name,id,date"/>
                <property name="regex"
                          value="([A-Za-z]{3})(\d{4})(\d{2}/\d{2}/\d{2})"/>
                <property name="strict" value="false"/>
            </bean>
        </property>
        <property name="fieldSetMapper">
            <!-- Parse the object -->
            <bean class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper">
                <property name="prototypeBeanName" value="testRecord" />
        </property>
    </bean>
</property>

Upvotes: 0

Related Questions