Freya Ren
Freya Ren

Reputation: 2164

CSVReader cannot read a line correctly

I have a .csv file with 12 columns and read the file with CSVReader class.

List<String[]> rows = reader.readAll();

But I found some String[] have less than 12 elements. When I debugged, I found this is the CSV text format problem.

There are two problems:

  1. Some columns end with backslash.

    For example, "Column A content\", "Column B content" will be read as one column as \" is seen as an escape character.

  2. Some cells' contents have \" in them.

    For example, in one row, column A's content is a command line: "d -R u+rwX \""${MYTMP}\"" > /dev/null 2>&1; rm -fr \""${MYTMP}\"" >"

So I cannot think of a good replacement strategy to deal with this format problem. (e.g replace all \ with \\, this works for "contentA\","contentB" situation, but don't work for \" when it is the cell's content )

Any suggestions? Also welcome to discuss the bad formatting problems and solutions you experienced in CSV files so that Reader has problem reading correctly.

Upvotes: 0

Views: 2276

Answers (2)

Paul Vargas
Paul Vargas

Reputation: 42030

If you have one line like the next:

"Column A content\","Column B content","d -R u+rwX \""${MYTMP}\"" > /dev/null 2>&1; rm -fr \""${MYTMP}\"" >"

Try the next:

CSVParser parser = new CSVParser();
String line = "\"Column A content\\\",\"Column B content\",\"d -R u+rwX \\\"\"${MYTMP}\\\"\" > /dev/null 2>&1; rm -fr \\\"\"${MYTMP}\\\"\" >\"";
line = line.replaceAll("\\\\\"(?=,)", "\\\\\\\\\"");
line = line.replaceAll("\\\\\"\"", "\\\\\"");
String[] array = parser.parseLine(line);
for (String str : array) {
    System.out.println(str);
}

Output:

Column A content\
Column B content
d -R u+rwX "${MYTMP}" > /dev/null 2>&1; rm -fr "${MYTMP}" >

Upvotes: 0

Camouflage
Camouflage

Reputation: 269

I think that if you replace \", with \\", that will solve your problem. Most likely Unix command lines do not contain , character right after the \". Yo may have to extend it with \", " to \\", " or maybe adding whitespaces.

A special case would be when your last column ends, so \"<nl> should be replaced to \\"<nl> where <nl> is whatever line delimiter you have (\r\n, \r or \n)

Upvotes: 1

Related Questions