John Snow
John Snow

Reputation: 23

Parsing a text file using java with multiple values per line to be extracted

I'm not going to lie I'm really bad at making regular expressions. I'm currently trying to parse a text file that is giving me a lot of issues. The goal is to extract the data between their respective "tags/titles". The file in question is a .qbo file laid out as follows personal information replaced with "DATA": The parts that I care about retrieving are between the "STMTTRM" and "/STMTTRM" tags as the rest I don't plan on putting in my database, but I figured it would help others see the file content I'm working with. I apologize for any confusion prior to this update.

FXHEADER:100
DATA:OFXSGML
VERSION:102
SECURITY:NONE
ENCODING:USASCII
CHARSET:1252
COMPRESSION:NONE
OLDFILEUID:NONE
NEWFILEUID:NONE

<OFX>
<SIGNONMSGSRSV1><SONRS>
    <STATUS><CODE>0</CODE><SEVERITY>INFO</SEVERITY></STATUS>
    <DTSERVER>20190917133617.000[-4:EDT]</DTSERVER>
    <LANGUAGE>ENG</LANGUAGE>
    <FI>
        <ORG>DATA</ORG>
        <FID>DATA</FID>
    </FI>
    <INTU.BID>DATA</INTU.BID>
    <INTU.USERID>DATA</INTU.USERID>
</SONRS></SIGNONMSGSRSV1>
<BANKMSGSRSV1>
<STMTTRNRS>
    <TRNUID>0</TRNUID>
    <STATUS><CODE>0</CODE><SEVERITY>INFO</SEVERITY></STATUS>
    <STMTRS>
        <CURDEF>USD</CURDEF>
        <BANKACCTFROM>
            <BANKID>DATA</BANKID>
            <ACCTID>DATA</ACCTID>
            <ACCTTYPE>CHECKING</ACCTTYPE>
            <NICKNAME>FREEDOM CHECKING</NICKNAME>
        </BANKACCTFROM>
        <BANKTRANLIST>
            <DTSTART>20190717</DTSTART><DTEND>20190917</DTEND>
            <STMTTRN><TRNTYPE>POS</TRNTYPE><DTPOSTED>20190717071500</DTPOSTED><TRNAMT>-5.81</TRNAMT><FITID>3893120190717WO</FITID><NAME>DATA</NAME><MEMO>POS Withdrawal</MEMO></STMTTRN>
            <STMTTRN><TRNTYPE>DIRECTDEBIT</TRNTYPE><DTPOSTED>20190717085000</DTPOSTED><TRNAMT>-728.11</TRNAMT><FITID>4649920190717WE</FITID><NAME>CHASE CREDIT CRD</NAME><MEMO>DATA</MEMO></STMTTRN>
            <STMTTRN><TRNTYPE>ATM</TRNTYPE><DTPOSTED>20190717160900</DTPOSTED><TRNAMT>-201.99</TRNAMT><FITID>6674020190717WA</FITID><NAME>DATA</NAME><MEMO>ATM Withdrawal</MEMO></STMTTRN>
        </BANKTRANLIST>
        <LEDGERBAL><BALAMT>2024.16</BALAMT><DTASOF>20190917133617.000[-4:EDT]</DTASOF></LEDGERBAL>
        <AVAILBAL><BALAMT>2020.66</BALAMT><DTASOF>20190917133617.000[-4:EDT]</DTASOF></AVAILBAL>
    </STMTRS>
</STMTTRNRS>
</BANKMSGSRSV1>
</OFX>

I want to be able to end with data that looks or acts like the following so that each row of data can easily be added to a database: Example Parse

Upvotes: 1

Views: 146

Answers (3)

Artyom Rebrov
Artyom Rebrov

Reputation: 691

I would propose the following approach.

Read file line by line with Files:

final List<String> lines = Files.readAllLines(Paths.get("/path/to/file"));

At this point you would have all file line separated and ready to convert the string lines into something more useful. But you should create class beforehand.

Create a class for your data in line, something like:

public class STMTTRN {
   private String TRNTYPE;
   private String DTPOSTED;
   ...
   ...
   //constructors
   //getters and setters
}

Now when you have a data in each separate string and a class to hold the data, you can convert lines to objects with Jackson:

final XmlMapper xmlMapper = new XmlMapper();
final STMTTRN stmttrn = xmlMapper.readValue(lines[0], STMTTRN.class);

You may want to create a loop or make use of stream with a mapper and a collector to get the list of STMTTRN objects:

final List<STMTTRN> stmttrnData = lines.stream().map(this::mapLine).collect(Collectors.toList());

Where the mapper might be:

private STMTTRN mapLine(final String line) {
    final XmlMapper xmlMapper = new XmlMapper();

    try {
        return xmlMapper.readValue(line, STMTTRN.class);

    } catch (IOException e) {
        throw new RuntimeException(e);
    }
}

Upvotes: 0

Sambit
Sambit

Reputation: 8031

As David has already answered, It is good to parse the POS output XML using Java. If you are more interested about about regex to get all the information, you can use this regular expression.

<[^>]+>|\\n+

You can test in the following sites.

https://rubular.com/ https://www.regextester.com/

Upvotes: 1

David Brossard
David Brossard

Reputation: 13834

Given this is XML, I would do one of two things:

  • either use the Java DOM objects to marshall/unmarshall to/from Java objects (nodes and elements), or
  • use JAXB to achieve something similar but with better POJO representation.

Mkyong has tutorials for both. Try the dom parsing or jaxb. His tutorials are simple and easy to follow.

JAXB requires more work and dependencies. So try DOM first.

Upvotes: 0

Related Questions