Ashu
Ashu

Reputation: 377

spark exceptAll weird behavior

Can someone help me explain this behavior:

scala> val l1 = List(84.99F, 9.99F).toDF("dec")
l1: org.apache.spark.sql.DataFrame = [dec: float]

scala> val l2 = List(84.99, 9.99).toDF("dec")
l2: org.apache.spark.sql.DataFrame = [dec: double]

scala> l1.show
+-----+
|  dec|
+-----+
|84.99|
| 9.99|
+-----+


scala> l2.show
+-----+
|  dec|
+-----+
|84.99|
| 9.99|
+-----+

scala> l1.exceptAll(l2).show(false)
+-----------------+                                                             
|dec              |
+-----------------+
|9.989999771118164|
|84.98999786376953|
+-----------------+

l1.select('dec.cast("double")).exceptAll(l2).show(false)
+-----------------+                                                             
|dec              |
+-----------------+
|9.989999771118164|
|84.98999786376953|
+-----------------+

I do understand it's due to the float vs double column comparison in exceptAll, but how and where is the weird diff coming from?

Upvotes: 1

Views: 581

Answers (1)

Moritz
Moritz

Reputation: 925

exceptAll requires Spark to widen (cast) the type of l1 to double as well. And such a cast is not necessarily precise causing the result you are seeing:

List(84.99F, 9.99F).toDF("dec")
  .select('dec.cast("double"))
  .show()

+-----------------+
|              dec|
+-----------------+
|84.98999786376953|
|9.989999771118164|
+-----------------+

Upvotes: 1

Related Questions