KK2486
KK2486

Reputation: 353

Spark 2.4 CSV Load Issue with option "nullvalue"

We were using Spark 2.3 before, now we're on 2.4:

Spark version 2.4.0
Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_212)

We had a piece of code running in production that converted csv files to parquet format. One of the options we had set csv load is option("nullValue", null). There's something wrong with how it works in spark 2.4.

Here's an example to show the issue.

  1. let's create the following /tmp/test.csv file:
C0,C1,C2,C3,C4,C5
1,"1234",0.00,"","D",0.00
2,"",0.00,"","D",0.00

  1. Now if we load it in spark-shell
scala> val data1 = spark.read.option("header", "true").option("inferSchema", "true").option("treatEmptyValuesAsNulls","true").option("nullValue", null).csv("file:///tmp/test.csv")

we get an empty row:
scala> data1.show
+----+----+----+----+----+----+
| C0| C1| C2| C3| C4| C5|
+----+----+----+----+----+----+
| 1|1234| 0.0| | D| 0.0|
|null|null|null|null|null|null|
+----+----+----+----+----+----+

  1. If we additionally change the csv a little (replaced empty string with "1" in the last row)
C0,C1,C2,C3,C4,C5
1,"1234",0.00,"","D",0.00
2,"",0.00,"1","D",0.00

the result is even worse:

scala> val data2 = spark.read.option("header", "true").option("inferSchema", "true").option("treatEmptyValuesAsNulls","true").option("nullValue", null).csv("file:///tmp/test.csv")

scala> data2.show
+----+----+----+----+----+----+
| C0| C1| C2| C3| C4| C5|
+----+----+----+----+----+----+
|null|null|null|null|null|null|
|null|null|null|null|null|null|
+----+----+----+----+----+----+

Is this a bug in new version of spark 2.4.0 ? Any body faced similar issue ?

Upvotes: 2

Views: 16519

Answers (1)

KK2486
KK2486

Reputation: 353

spark option emptyValue solved issue

val data2 = spark.read.option("header", "true")
     .option("inferSchema", "true")
     .option("treatEmptyValuesAsNulls","true")
     .option("nullValue", null)***
     .option("emptyValue", null)***
     .csv("file:///tmp/test.csv")

Upvotes: 4

Related Questions