dkantowitz
dkantowitz

Reputation: 1951

How do I skip white-space only lines using Super CSV?

How do I configure Super CSV to skip blank or white-space only lines?

I'm using the CsvListReader and sometimes I'll get a blank line in my data. When this happens, an exception to the effect of:

number of CellProcessors must match number of fields

I'd like to simply skip these lines.

Upvotes: 3

Views: 2878

Answers (2)

James Bassett
James Bassett

Reputation: 9868

Update: Super CSV 2.1.0 (released April 2013) allows you to supply a CommentMatcher via the preferences that will let you skip lines that are considered comments. There are 2 built in matchers you can use, or you can supply your own. In this case you could use new CommentMatches("\\s+") to skip blank lines.


Super CSV only skips lines of zero length (just a line terminator).

It's not a valid CSV file if there are blank lines (see rule 4 of RFC4180 which states that Each line should contain the same number of fields throughout the file). The only time a blank line is valid is if it's part of a multi-line field surrounded by quotes. e.g.

column1,column2
"multi-line field

with a blank line",value2

That being said, it might be possible to make Super CSV a bit more lenient with blank lines (it could ignore them). If you could post a feature request on our SourceForge page, we can investigate this further and potentially add this functionality in a future release.

That doesn't help you right now though!

I haven't done extensive testing on this, but it should work :) You can write your own tokenizer that skips blank lines:

package org.supercsv.io;

import java.io.IOException;
import java.io.Reader;
import java.util.List;

import org.supercsv.prefs.CsvPreference;

public class SkipBlankLinesTokenizer extends Tokenizer {

    public SkipBlankLinesTokenizer(Reader reader, CsvPreference preferences) {
        super(reader, preferences);
    }

    @Override
    public boolean readColumns(List<String> columns) throws IOException {

        boolean moreInput = super.readColumns(columns);

        // keep reading lines if they're blank
        while (moreInput && (columns.size() == 0 || 
                             columns.size() == 1 && 
                             columns.get(0).trim().isEmpty())){
            moreInput = super.readColumns(columns);
        }

        return moreInput;
    }

}

And just pass this into the constructor of your reader (you'll have to pass the preferences into both the reader and the tokenizer):

ICsvListReader listReader = null;
try {
    CsvPreference prefs = CsvPreference.STANDARD_PREFERENCE;
    listReader = new CsvListReader(
        new SkipBlankLinesTokenizer(new FileReader(CSV_FILENAME), prefs),
        prefs);
...

Hope this helps

Upvotes: 3

PhiLho
PhiLho

Reputation: 41132

I didn't know this library (you should add a Java tag...), but looking at the examples, I see they have readers supporting a variable number of rows per line. An empty line is a sub-case of this pattern.

Alternatively (maybe less efficient), you can just catch the exception and go on with your reading...

Upvotes: 0

Related Questions