Reputation: 2164
I have a .csv file with 12 columns and read the file with CSVReader class.
List<String[]> rows = reader.readAll();
But I found some String[] have less than 12 elements. When I debugged, I found this is the CSV text format problem.
There are two problems:
Some columns end with backslash.
For example, "Column A content\", "Column B content"
will be read as one column as \"
is seen as an escape character.
Some cells' contents have \"
in them.
For example, in one row, column A's content is a command line:
"d -R u+rwX \""${MYTMP}\"" > /dev/null 2>&1; rm -fr \""${MYTMP}\"" >"
So I cannot think of a good replacement strategy to deal with this format problem.
(e.g replace all \
with \\
, this works for "contentA\","contentB"
situation, but don't work for \"
when it is the cell's content )
Any suggestions? Also welcome to discuss the bad formatting problems and solutions you experienced in CSV files so that Reader has problem reading correctly.
Upvotes: 0
Views: 2276
Reputation: 42030
If you have one line like the next:
"Column A content\","Column B content","d -R u+rwX \""${MYTMP}\"" > /dev/null 2>&1; rm -fr \""${MYTMP}\"" >"
Try the next:
CSVParser parser = new CSVParser();
String line = "\"Column A content\\\",\"Column B content\",\"d -R u+rwX \\\"\"${MYTMP}\\\"\" > /dev/null 2>&1; rm -fr \\\"\"${MYTMP}\\\"\" >\"";
line = line.replaceAll("\\\\\"(?=,)", "\\\\\\\\\"");
line = line.replaceAll("\\\\\"\"", "\\\\\"");
String[] array = parser.parseLine(line);
for (String str : array) {
System.out.println(str);
}
Output:
Column A content\
Column B content
d -R u+rwX "${MYTMP}" > /dev/null 2>&1; rm -fr "${MYTMP}" >
Upvotes: 0
Reputation: 269
I think that if you replace \",
with \\",
that will solve your problem.
Most likely Unix command lines do not contain ,
character right after the \"
. Yo may have to extend it with \", "
to \\", "
or maybe adding whitespaces.
A special case would be when your last column ends, so \"<nl>
should be replaced to \\"<nl>
where <nl>
is whatever line delimiter you have (\r\n
, \r
or \n
)
Upvotes: 1