Rama Arjun
Rama Arjun

Reputation: 303

Defining field separator and text qualifier on the fly for CSV file

I am reading csv files which have comma(,) as field separator and double quotes(") as text qualifier. Following is the code to fetch columns of a row:

row.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)")

which is working fine. But I want to specify my field separator and text qualifier dynamically i.e they will be passed as input along with csv file. And file will be parsed as per the input filed separator & text qualifier. How can I modify the regular expression above to take field separator & text qualifier on the fly.

EDIT: I am using Apache Commons CSV to parse csv files. But in my case header row can be any row in the file. And there is no way to pass header row index to Commons CSV parser. So I will read the file manually and fetch header row. Split its columns into a String array and pass it to parser. Plus field separator & text qualifier are user defined, so need to do the splitting on that basis.

Upvotes: 3

Views: 2471

Answers (2)

Jeronimo Backes
Jeronimo Backes

Reputation: 6289

uniVocity-parsers can autodetect the input format to discover what separator/quote character to use:

    CsvParserSettings settings = new CsvParserSettings(); //many options here, check the tutorial.

    // turns on automatic detection of line separators, 
    // column separators, quotes & quote escapes
    settings.detectFormatAutomatically();

    // configures to skip a number of rows from the input and start parsing from there
    settings.setNumberOfRowsToSkip(3);

    // configures the parser to extract headers from the first parsed row
    settings.setHeaderExtractionEnabled(true);

    CsvParser parser = new CsvParser(settings);
    List<String[]> rows = parser.parseAll(new File("/path/to/your/file.csv"));

Disclaimer: I'm the author of this library, it's open-source and free (Apache 2.0 license)

Upvotes: 3

Boris the Spider
Boris the Spider

Reputation: 61148

TL;DR: Use a CSV parser

This is the only answer that is correct. There is only one way to parse files, and that is with a parser.

Example usage OpenCSV (no affiliation, just my preferred choice):

try(final  CSVReader reader = new CSVReader(new FileReader("yourfile.csv"), '\t', '"', '\'')) {
//                                                ^ your file                ^ delimiter
//                                                                                 ^ quote
//                                                                                       ^ escape char

}

This is fully configurable and supports escape sequences, unlike your regex solution.

Upvotes: 0

Related Questions