Cary Li
Cary Li

Reputation: 261

Apache common CSVParser/CSVRecord to return null for empty fields

I have a question for Apache common CSVParser/CSVRecord. Take a look at the CSV file below:

Header1,Header2,Header3
"",,"L1C3"

CSVParser/CSVRecord is returning "" for the first two columns. In my case I want to distinguish empty string("") and null values. Is there a configuration I could set to let CSVParser to return null for the second column?

Upvotes: 7

Views: 13758

Answers (4)

Carlos Jaime C. De Leon
Carlos Jaime C. De Leon

Reputation: 2896

I've used this format :

CSVFormat.RFC4180.withFirstRecordAsHeader()
   .withIgnoreSurroundingSpaces()
   .withNullString("")

Where the 2 configurations:

  1. ignore space - which trims any value on both sides, if its all space, it will be trimmed to empty space
  2. null string - which treats the empty spaces as null

Here's a sample usage:

import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVParser;
import org.apache.commons.csv.CSVRecord;
import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertNull;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.StringReader;
import org.junit.Test;

public class CsvParseTest {

    @Test
    public void testParseWillTrimAndConvertToNull() throws Exception {
        String CSV_HEADER = "Name,MobileNo,Location";
        String CSV_ROW_1 = "abc,   ,australia"; // MobileNo is 3 whitespaces
        CSVParser parse = CSVFormat.RFC4180.withFirstRecordAsHeader().withIgnoreSurroundingSpaces().withNullString("")
                .parse(new BufferedReader(new StringReader(CSV_HEADER + "\n" + CSV_ROW_1)));

        CsvRecord rec = parse.getRecords().get(0);
        assertEquals("abc", rec.get("Name"));
        assertNull(rec.get("MobileNo"));
        assertEquals("australia", rec.get("Location"));
    }
}

Upvotes: 9

RamValli
RamValli

Reputation: 4485

In Apache commons csv 1.2, we could use the CSVFormat class method withNullString() to convert null strings to NULL. Here null string could be "" or "N/A" or "Nill" according to your requirement.

CSVFormat csvFormat = CSVFormat.DEFAULT.withNullString("");
CSVParser csvParser = new CSVParser(fileReader, csvFormat);

This would give NULL, NULL, L1C3 for the given record in question.

Note: Empty records are automatically converted to empty strings thus resulting in converted to NULL value at last.

Upvotes: 0

Cary Li
Cary Li

Reputation: 261

At the end I didn't find a good solution to return null with Apache Commons CSV library. I switched to OpenCSV 3.6 and here is the code I used, which I also posted on another thread. Thanks to everyone else that suggested OpenCSV.

CSVReaderBuilder has withFieldAsNull() for this purpose.

CSVReader csvReader = new CSVReaderBuilder(csvFileReader)
    .withFieldAsNull(CSVReaderNullFieldIndicator.EMPTY_SEPARATORS)
    .build();

Upvotes: 1

Jeronimo Backes
Jeronimo Backes

Reputation: 6289

I think uniVocity-parsers is the only library that allows you to distinguish empty strings from nulls (I know this won't address your problem with Apache Commons CSV directly, but at least there's a way to get what you need).

Here's how to do it:

public static void main(String ... args){
    String input = "Header1,Header2,Header3\n" +
            "\"\",,\"L1C3\"";

    CsvParserSettings settings = new CsvParserSettings(); //many options here, check the tutorial.
    settings.setEmptyValue("I'm empty"); //value to use when the parser finds "". Set to "" to get an empty String.
    settings.setNullValue("I'm null"); //value to use when the parser finds a null value (i.e. ,,). 

    CsvParser parser = new CsvParser(settings);
    List<String[]> allRows = parser.parseAll(new StringReader(input));

    for(String[] row : allRows){
        System.out.println(Arrays.toString(row));
    }
}

This will produce the following output:

[Header1, Header2, Header3]
[I'm empty, I'm null, L1C3]

uniVocity-parsers is also 3 times faster than Apache Commons CSV and has way more features.

Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).

Upvotes: 2

Related Questions