beldaz
beldaz

Reputation: 4471

Parsing part of a CSV file in Java

I need to deal with a CSV file that actually contains several tables, like this:

"-------------------- Section 1 --------------------"

"Identity:","ABC123"
"Initials:","XY"
"Full Name:","Roger"
"Street Address:","Foo St"


"-------------------- Section 2 --------------------"

"Line","Date","Time","Status",

"1","30/01/2013","10:49:00 PM","ON",
"2","31/01/2013","8:04:00 AM","OFF",
"3","31/01/2013","11:54:00 PM","OFF",


"-------------------- Section 3 --------------------"

I'd like to parse the blocks in each section with something like commons-csv, but it would be helpful to handle each section individually, stopping at the double-newline as if it was the end of file. Has anyone tackled this problem already?

NOTE: Files can be arbitrarily long, and can contain any number of sections, so I'm after a single pass if possible. Each section appears to start with a titled heading (------- title ------\n\n) and end with two empty lines.

Upvotes: 0

Views: 1562

Answers (3)

K.Nicholas
K.Nicholas

Reputation: 11551

How about use java.io.FilterReader? You can figure out what Reader methods you need to override by trial and error. You custom class will have to read ahead an entire line and see if it is a 'Section' line. If it is, then return EOF to stop the commons-csv parser. You can then read the next section from your custom class. Not elegant, but it would probably work. Example given:

class MyReader extends FilterReader {
    private String line;
    private int pos;
    public MyReader(BufferedReader in) { 
        super(in);
        line = null;
        pos = 0;
    }
    @Override
    public int read() {
        try {
            if ( line == null || pos >= line.length() ) {
                do {
                    line = ((BufferedReader)in).readLine();
                } while ( line != null && line.length() == 0 );
                if ( line == null ) return -1;
                line = line + "\r\n";
                pos = 0;
            }
            if ( line.contains("-------------------- Section ") ) {
                line = null;
                return -1;
            }
            return line.charAt(pos++);
        } catch ( Exception e) { throw new RuntimeException(e); }
    }
}

You would use it like so:

public void run() throws Exception {
    BufferedReader in = new BufferedReader(new FileReader(ReadRecords.class.getResource("/records.txt").getFile()));
    MyReader reader = new MyReader(in);
    int c;
    while( (c=reader.read()) != -1 ) { 
        System.out.print((char)c);
    }
    while( (c=reader.read()) != -1 ) { 
        System.out.print((char)c);
    }
    while( (c=reader.read()) != -1 ) { 
        System.out.print((char)c);
    }
    reader.close();
}

Upvotes: 3

PNS
PNS

Reputation: 19895

Assuming that the file contains text in 2 sections, delineated as per the example, its processing is straightforward, e.g.:

  1. Create a Java BufferedReader object to read the file line-by-line
  2. Read Section 1 and extract the key-value pairs
  3. Read and ignore the remaining lines, until the CSV header (Section 2)
  4. Initialize a CSV parser (commons-csv or other) using the header and the other parameters (comma separator, quotes etc.)
  5. Process every subsequent line with the parser

The parser will provide some iterator-like API to read each line into a Java object, from which reading the fields will be trivial. This approach is vastly superior to pre-loading everything in memory, because it can accommodate any file size.

Upvotes: 0

Lukas Eder
Lukas Eder

Reputation: 220762

You can use String.split() to access the individual CSV sections:

for (String csv : content.split("\"----+ Section \\d+ ----+\"")) {

    // Skip empty sections
    if (csv.length() == 0) continue;

    // parse and process each individual "csv" section here
}

Upvotes: 1

Related Questions