matiasf
matiasf

Reputation: 1118

Parsing multi-line fixed-width files

I have a fixed-width flat file. To make matters worse, each line can either be a new record or a subrecord of the line above, identified by the first character on each line:

A0020SOME DESCRIPTION   MORE DESCRIPTION 922 2321      # Separate
A0021ANOTHER DESCRIPTIONMORE DESCRIPTION 23111442      # records
B0021ANOTHER DESCRIPTION   THIS TIME IN ANOTHER FORMAT # sub-record of record "0021"

I've tried using Flatworm which seems to be an excellent library for parsing fixed-width data. It's documentation, unfortunately, states:

"Repeating segments are supported only for delimited files"

(ibid, "Repeating segments").

I'd rather not write a custom parser for this. Is it (1) possible to do this in Flatworm or (2) is there a library providing such (multi-line, multi-sub-record) capabilities?

Upvotes: 0

Views: 4119

Answers (3)

Jeronimo Backes
Jeronimo Backes

Reputation: 6289

With uniVocity-parsers you can not only read fixed-width inputs, but you can also read master-detail rows (in which a row has sub-rows).

Here's an example:

//1st, use a RowProcessor for the "detail" rows.
ObjectRowListProcessor detailProcessor = new ObjectRowListProcessor();

//2nd, create MasterDetailProcessor to identify whether or not a row is the master row.
// the row placement argument indicates whether the master detail row occurs before or after a sequence of "detail" rows.
MasterDetailListProcessor masterRowProcessor = new MasterDetailListProcessor(RowPlacement.TOP, detailProcessor) {
    @Override
    protected boolean isMasterRecord(String[] row, ParsingContext context) {
        //Returns true if the parsed row is the master row.
        return row[0].startsWith("B");
    }
};

FixedWidthParserSettings parserSettings = new FixedWidthParserSettings(new FixedWidthFieldLengths(4, 5, 40, 40, 8));

// Set the RowProcessor to the masterRowProcessor.
parserSettings.setRowProcessor(masterRowProcessor);

FixedWidthParser parser = new FixedWidthParser(parserSettings);
parser.parse(new FileReader(yourFile));

// Here we get the MasterDetailRecord elements.
List<MasterDetailRecord> rows = masterRowProcessor.getRecords();
for(MasterDetailRecord masterRecord = rows){
 // The master record has one master row and multiple detail rows.
    Object[] masterRow = masterRecord.getMasterRow();
    List<Object[]> detailRows = masterRecord.getDetailRows();
}

Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).

Upvotes: 0

Skip Head
Skip Head

Reputation: 7760

Have you looked at JRecordBind?

http://jrecordbind.org/

"JRecordBind supports hierarchical fixed length files: records of some type that are 'sons' of other record types."

Upvotes: 2

Wilfred Springer
Wilfred Springer

Reputation: 10927

Check Preon. Although Preon is targeting bitstream compressed data, you might be able to twist its arm and use it for the file format you identified as well. The benefit of using Preon would be that it will generate human-readable documentation as well.

Upvotes: 0

Related Questions