Reputation: 115
I have been trying to read a csv and add fields to a Data Structure. But, one of the row is not formed properly, and I am aware of that. I just want to skip the row and move on to another. But, even though I am catching the exception, It's still breaking the loop. Any idea what I am missing here?
My csv:
"id","name","email"
121212,"Steve","[email protected]"
121212,"Steve","[email protected]",,
121212,"Steve","[email protected]"
My code:
import com.fasterxml.jackson.databind.MappingIterator;
import com.fasterxml.jackson.dataformat.csv.CsvMapper;
import com.fasterxml.jackson.dataformat.csv.CsvSchema;
public static void main(String[] args) throws Exception{
Path path = Paths.get("list2.csv");
CsvMapper mapper = new CsvMapper();
CsvSchema schema = CsvSchema.emptySchema().withHeader();
MappingIterator<Object> it = mapper.reader(Object.class)
.with(schema)
.readValues(path.toFile());
try{
while(it.hasNext()){
Object row;
try{
row = it.nextValue();
} catch (IOException e){
e.printStackTrace();
continue;
}
}
} catch (ArrayIndexOutOfBoundsException e){
e.printStackTrace();
}
}
Exception:
com.fasterxml.jackson.core.JsonParseException: Too many entries: expected at most 3 (value #3 (0 chars) "")
at [Source: java.io.InputStreamReader@12b3519c; line: 3, column: 38]
at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1486)
at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:518)
at com.fasterxml.jackson.dataformat.csv.CsvParser._handleNextEntryExpectEOL(CsvParser.java:601)
at com.fasterxml.jackson.dataformat.csv.CsvParser._handleNextEntry(CsvParser.java:587)
at com.fasterxml.jackson.dataformat.csv.CsvParser.nextToken(CsvParser.java:474)
at com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.mapObject(UntypedObjectDeserializer.java:592)
at com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.deserialize(UntypedObjectDeserializer.java:440)
at com.fasterxml.jackson.databind.MappingIterator.nextValue(MappingIterator.java:188)
at CSVTest.main(CSVTest.java:24)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
java.lang.ArrayIndexOutOfBoundsException: 3
at com.fasterxml.jackson.dataformat.csv.CsvSchema.column(CsvSchema.java:941)
at com.fasterxml.jackson.dataformat.csv.CsvParser._handleNamedValue(CsvParser.java:614)
at com.fasterxml.jackson.dataformat.csv.CsvParser.nextToken(CsvParser.java:476)
at com.fasterxml.jackson.databind.MappingIterator.hasNextValue(MappingIterator.java:158)
at CSVTest.main(CSVTest.java:21)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
Upvotes: 7
Views: 7598
Reputation: 6289
Your CSV is not necessarily malformed, in fact it's very common to have rows with varying number of columns.
univocity-parsers handles this without any trouble.
The easiest way would be:
BeanListProcessor<TestBean> rowProcessor = new BeanListProcessor<TestBean>(TestBean.class);
CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.setRowProcessor(rowProcessor);
parserSettings.setHeaderExtractionEnabled(true);
CsvParser parser = new CsvParser(parserSettings);
parser.parse(new FileReader(Paths.get("list2.csv").toFile());
// The BeanListProcessor provides a list of objects extracted from the input.
List<TestBean> beans = rowProcessor.getBeans();
If you want to discard the elements built using a row with inconsistent number of column, override the beanProcessed
method and use the ParsingContext
object to analyse your data and decide whether to keep or drop the row.
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).
Upvotes: 2
Reputation: 116522
With Jackson 2.6 handling of readValues()
has been improved to try to recover from processing errors, such that in many cases you can just try again, to read following valid rows. So make sure to use at least version 2.6.2
.
Earlier versions did not recover as well, usually rendering rest of the content unprocessable; this may be what happened in your case.
Another possibility, given that your problem is not with invalid CSV, but rather one not mappable as POJOs (at least the way as POJO is defined), is to read content as a sequence of String[]
, and handling mapping manually. Jackson's CSV parser itself does not mind any number of columns, it is the higher level databinding that does like finding "extra" content that it does not recognize.
Upvotes: 2
Reputation: 12214
I can't tell for certain since some of the stack trace was omitted, however:
ArrayIndexOutOfBoundsException
is the exception that is thrown (as opposed to being a "cause") then the reason is that you catch it outside of your loop.IOException
, then as Chris Gerken wrote it may be thrown in it.hasNext()
, in which case you don't catch it at all and so your program will exit.The remainder of the stack trace would indicate which of these, or some other reason altogether, is the problem.
Update based on complete output and stack traces:
On line 24 of CSVTest.java, you call .nextValue()
. In the implementation of calling this method, a JsonParseException
is thrown. Since that is a subclass of IOException
, your catch block catches it, prints the stack trace and continues with your loop. So far so good.
com.fasterxml.jackson.core.JsonParseException: Too many entries: expected at most 3 (value #3 (0 chars) "") at [Source: java.io.InputStreamReader@12b3519c; line: 3, column: 38] at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1486) at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:518) at com.fasterxml.jackson.dataformat.csv.CsvParser._handleNextEntryExpectEOL(CsvParser.java:601) at com.fasterxml.jackson.dataformat.csv.CsvParser._handleNextEntry(CsvParser.java:587) at com.fasterxml.jackson.dataformat.csv.CsvParser.nextToken(CsvParser.java:474) at com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.mapObject(UntypedObjectDeserializer.java:592) at com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.deserialize(UntypedObjectDeserializer.java:440) at com.fasterxml.jackson.databind.MappingIterator.nextValue(MappingIterator.java:188) at CSVTest.main(CSVTest.java:24)
After that, on line 21 of CSVTest.java, you call .hasNextValue()
. In the implementation of this method, an ArrayIndexOutOfBoundsException
is thrown. You catch it, and also print the stack trace. However your catch block is outside of your loop, and so by the time you catch the exception the loop has already been exited.
java.lang.ArrayIndexOutOfBoundsException: 3 at com.fasterxml.jackson.dataformat.csv.CsvSchema.column(CsvSchema.java:941) at com.fasterxml.jackson.dataformat.csv.CsvParser._handleNamedValue(CsvParser.java:614) at com.fasterxml.jackson.dataformat.csv.CsvParser.nextToken(CsvParser.java:476) at com.fasterxml.jackson.databind.MappingIterator.hasNextValue(MappingIterator.java:158) at CSVTest.main(CSVTest.java:21)
If you really want to continue your loop here, then you will need to move that try-catch construct inside the loop. Perhaps like this:
while (true)
{
try
{
if (!it.hasNextValue())
{ break; }
}
catch (final ArrayIndexOutOfBoundsException err)
{
err.printStackTrace();
continue;
}
Object row;
try
{ row = it.nextValue(); }
catch (final IOException err)
{
err.printStackTrace();
continue;
}
}
However, this code is an infinite loop. When hasNextValue()
throws an ArrayIndexOutOfBoundsException, the state has not changed the loop will never end. I show this to show the principle of moving the catch block inside the loop, not as a workable resolution.
You added a comment to the question referencing discussion of error handling in jackson-dataformat-csv. It appears that you encountered a limitation (or bug) in the library when it comes to skipping malformed rows.
Upvotes: 0
Reputation: 16392
com.fasterxml.jackson.core.JsonParseException
is an IOException
so that exception should be caught in the try-catch block. The fact that it is not being caught leads me to believe that it's happening in the hasNext()
method. That's a common pattern: in order to know whether there is another you actually have to try to read the next one.
Upvotes: 1