Reputation: 6220
I am trying to read Iris DataSet into flink with its readCsvFile function and I keep getting ParserError NUMERIC_VALUE_ILLEGAL_CHARACTER
The csv (With hidden chars visible) is like this
I've tried to remove the whitespaces between items in the file, without luck. There are multiple items that can not be read, this is the error output:
org.apache.flink.api.common.io.ParseException: Line could not be parsed: '5.4, 3.0, 4.5, 1.5, 2'
ParserError NUMERIC_VALUE_ILLEGAL_CHARACTER
Expect field types: class java.lang.Long, class java.lang.Long, class java.lang.Long, class java.lang.Long, class java.lang.Integer
in file: /home/hkr/Documents/Estudios/Máster/TFM/Desarrollo/DPASF/dpasf/target/scala-2.11/test-classes/iris.dat
[...]
org.apache.flink.api.common.io.ParseException: Line could not be parsed: '4.5, 2.3, 1.3, 0.3, 1'
ParserError NUMERIC_VALUE_ILLEGAL_CHARACTER
[...]
But when I inspect those rows, I found no strange character, only those in the image below.
I've tried different things:
// Iris POJO
case class Iris(SepalLength:Long,
SepalWidth:Long,
PetalLength:Long,
PetalWidth:Long,
Class:Int)
val env = ExecutionEnvironment.getExecutionEnvironment
val dataSet = env.readCsvFile[Iris](getClass.getResource("/iris.dat").getPath, "\n", ",")
And differents combinations of this, like changing Class
attr to Long
, or using a Tuple5, and also using readCsvFile with defaults args:
val dataSet = env.readCsvFile[(Long, Long, Long, Long, Long)](getClass.getResource("/iris.dat").getPath)
val dataSet = env.readCsvFile[(Long, Long, Long, Long, Int)](getClass.getResource("/iris.dat").getPath)
Does anybody knows what may be happening? I do not know where to look anymore.
Upvotes: 1
Views: 417
Reputation: 21
Remove the white space before and after the delimeter can help solve this problem.
Upvotes: 1
Reputation: 6220
For some reason it seems to be working now after replacing all rows with a new line again, and then removing all white spaces from the dataset.
Upvotes: 1