Reputation: 9103
I have a one-line csv with the second value composed by multiple lines:
field1,"this
is still
field2","field3"
What I would like to get using Apache Camel, it is a json like this (after parsing the file):
{"field1":"field1","field2":"this
is still
field2","field3":"field3"}
but using the following code:
from('something...')
.transform(simple('/path/demooneline.csv', File.class))
.unmarshal().bindy(BindyType.Csv, Demo.class)
.marshal().json(JsonLibrary.Jackson).log('${body}')
@CsvRecord(separator = ',')
class Demo {
@JsonView
@DataField(pos = 1)
private String field1
@JsonView
@DataField(pos = 2)
private String field2
@JsonView
@DataField(pos = 3)
private String field3
}
I am getting back:
{"field1":"field1","field2":"this","field3":null},
{"field1":"is still","field2":null,"field3":null},
{"field1":null,"field2":"field3","field3":null}
that it looks like the csv separated in 3 lines,instead of 1 line with some fields delimited by quotes. @CsvRecord has "quote" as default. Is there a way to parse that kind of CSV with Camel (using or not bindy)?
Upvotes: 2
Views: 3043
Reputation: 3155
The problem is that your CSV file is not "typical". From Wikipidia:
"CSV" is not a single, well-defined format (although see RFC 4180 for one definition that is commonly used). Rather, in practice the term "CSV" refers to any file that:
- is plain text using a character set such as ASCII, Unicode, EBCDIC, or Shift JIS,
- consists of records (typically one record per line),
- with the records divided into fields separated by delimiters (typically a single reserved character such as comma, semicolon, or tab; sometimes the delimiter may include optional spaces),
- where every record has the same sequence of fields.
In your case, your record spans more than one line, which is why Camel is not parsing it as you expect, Camel is assuming each line is different record.
Edit
As I mentioned in the comment, it looks like Camel Bindy does not handle quoted fields containing line-breaks. As a workaround, you could "preprocess" the source CSV file to replace the line-breaks inside the qoutes. For example, using Guava:
from("file:///csvSrcDir?noop=true")
.process(new Processor() {
@Override
public void process(Exchange exchange) throws Exception {
final String inBody = exchange.getIn().getBody(String.class);
final Iterable<String> tokens = Splitter.on("\",").split(inBody);
final Iterable<String> fixedTokens = FluentIterable.from(tokens).transform(new Function<String, String>() {
@Nullable
@Override
public String apply(String input) {
return input.contains("\"\n") ? input : input.replace("\n", "<br>");
}
});
final String outBody = Joiner.on("\",").join(fixedTokens);
exchange.getOut().setBody(outBody);
}
})
.unmarshal().bindy(BindyType.Csv, Demo.class)
.split(body())
.process(new Processor() {
@Override
public void process(Exchange exchange) throws Exception {
Demo body = exchange.getIn().getBody(Demo.class);
}
});
The custom Processor converts this CSV file:
"record 1 field1","this
is still
record 1 field2","record 1 field3"
"record 2 field1","this
is still
record line 2 field2","record 2 field3"
file into:
"record 1 field1","this<br><br>is still<br><br> record 1 field2","record 1 field3"
"record 2 field1","this<br><br>is still<br><br> record 2 field2","record 2 field3"
which Bindy
can handle.
Upvotes: 3