UniVocity CSV parser does varying length ?

Question

I have a 26 million rows dataset and when I try parsing it with uniVocity parser it reads it as 18 million rows only. My rows field count varies from 158 to 162 with delimiter as ASCII '\u0001'.

wc -l output from linux >>>> wc -l withHeader.dat 26351323 withHeader.dat

But parser reads it as Total # of rows in file = 18554088 ( output from list.size of parser.parseAll() )

Can some one explain what could be the issue ?

this is my parserSettings

    settings.getFormat().setLineSeparator("
");
    settings.selectFields("acctId","tcat", "transCode");
    settings.getFormat().setDelimiter('\u0001');
    //settings.setAutoConfigurationEnabled(true);
    //settings.setMaxColumns(86);
    settings.setHeaderExtractionEnabled(false);

    // creates a CSV parser
    CsvParser parser = new CsvParser(settings);
    // parses all rows in one go.
    List allRows = parser.parseAll(newReader(filePath));
    System.out.println("Total # of rows in file = " + allRows.size());

UniVocity CSV parser does varying length ?

Answers (1)

Related Questions