Alejandro Alcalde
Alejandro Alcalde

Reputation: 6220

Flink ReadCsvFile ParserError NUMERIC_VALUE_ILLEGAL_CHARACTER

I am trying to read Iris DataSet into flink with its readCsvFile function and I keep getting ParserError NUMERIC_VALUE_ILLEGAL_CHARACTER

The csv (With hidden chars visible) is like this

enter image description here

I've tried to remove the whitespaces between items in the file, without luck. There are multiple items that can not be read, this is the error output:

org.apache.flink.api.common.io.ParseException: Line could not be parsed: '5.4, 3.0, 4.5, 1.5, 2'
ParserError NUMERIC_VALUE_ILLEGAL_CHARACTER 
Expect field types: class java.lang.Long, class java.lang.Long, class java.lang.Long, class java.lang.Long, class java.lang.Integer 
in file: /home/hkr/Documents/Estudios/Máster/TFM/Desarrollo/DPASF/dpasf/target/scala-2.11/test-classes/iris.dat
[...]
org.apache.flink.api.common.io.ParseException: Line could not be parsed: '4.5, 2.3, 1.3, 0.3, 1'
ParserError NUMERIC_VALUE_ILLEGAL_CHARACTER
[...]

But when I inspect those rows, I found no strange character, only those in the image below.

enter image description here

I've tried different things:

// Iris POJO
case class Iris(SepalLength:Long,
  SepalWidth:Long,
  PetalLength:Long,
  PetalWidth:Long,
  Class:Int)

val env = ExecutionEnvironment.getExecutionEnvironment
val dataSet = env.readCsvFile[Iris](getClass.getResource("/iris.dat").getPath, "\n", ",")

And differents combinations of this, like changing Class attr to Long, or using a Tuple5, and also using readCsvFile with defaults args:

val dataSet = env.readCsvFile[(Long, Long, Long, Long, Long)](getClass.getResource("/iris.dat").getPath)
val dataSet = env.readCsvFile[(Long, Long, Long, Long, Int)](getClass.getResource("/iris.dat").getPath)

Does anybody knows what may be happening? I do not know where to look anymore.

Upvotes: 1

Views: 417

Answers (2)

lycbug666
lycbug666

Reputation: 21

Remove the white space before and after the delimeter can help solve this problem.

Upvotes: 1

Alejandro Alcalde
Alejandro Alcalde

Reputation: 6220

For some reason it seems to be working now after replacing all rows with a new line again, and then removing all white spaces from the dataset.

Upvotes: 1

Related Questions