Reputation: 1984
IN a csv file that I have a record that renders like this:
,"SKYY SPA MARTINI
2 oz. SKYY Vodka
Fresh cucumber
Fresh mint
Splash of simple syrup
Muddle cucumber & mint with syrup.
Add SKYY Vodka and shake with ice.
Strain into a chilled martini glass.
Garnish with a fresh mint sprig and cucumber slice.",
with each line ending with a LF carriage return.
I thought that this would be treated as a string and the carriage returns wouldn't be treated as new lines, but this isn't the case, and is breaking my script. Is there a way to have the reader only have line breaks parsed if they're not flanked by quotes? I'm currently using this as my code, couldn't find a setting for the tokenizer that would allow me to perform this action.
// instantiate description line mapper
DelimitedLineTokenizer lineTokenizer = new DelimitedLineTokenizer();
DefaultLineMapper<LCBOProduct> lineMapper = new DefaultLineMapper<>();
lineMapper.setLineTokenizer(lineTokenizer);
lineMapper.setFieldSetMapper(fieldSetMapper);
// set description line mapper
reader.setLineMapper(lineMapper);
return reader;
Upvotes: 0
Views: 2459
Reputation: 126
Inspired by this CSV regex post, I have written a quick-and-dirty method for doing this:
public static void main(String[] args) {
String line = "\"BEEP\",\"BOOP\",\"TWO SHOTS\rOF VODKA\"\r\"BOOP\",\"BEEP\",\"LEMON\rWEDGES\"";
String quote = "\"";
String splitter = "\r";
String delimiter = ",";
parse(line, delimiter, quote, splitter);
}
public static void parse(String data, String delimiter, String quote, String splitter) {
String regex = splitter+"(?=(?:[^"+quote+"]*\"[^"+quote+"]*\")*[^"+quote+"]*$)";
String[] lines = data.split(regex, -1);
List<String[]> records = new ArrayList<String[]>();
for(String line : lines) {
records.add(line.split(delimiter, -1));
}
for(String[] line : records) {
for(String record : line) {
System.out.println("RECORD: " + record); //do whatever
}
}
}
Of course, considering the large size of some CSV files, you will need to chug along with a StringBuilder and likely use myStringBuilder.toString().split(regex, -1);
for the parse
method.
This is likely not the Spring way of doing things. But as Jim Garrison commented, this is an edge case that I'm not sure if Spring has ways of solving.
A more complex regex may be required if the records start using other nasty characters (commas, quotes, etc.). I don't know what the source of these records could be, but some sanitizing may be in order before splitting the file.
Upvotes: 1