Reputation: 77
Trying to convert a scientific notation in Spark (PySpark and SQL) and I seem to get really weird, non-deterministic results.
I've even tried to revert to using value times power(10,x) instead of valueEX, however I still get the same issue, where 3 different values are converted to the same value.
Is this some kind of scale setting in the actual conversion? Is this a thing with the Java underneath Spark?
SELECT CAST(3.57080748200000147E17 AS DECIMAL(18,0))
union all
SELECT CAST(3.570807482000001E17 AS DECIMAL(18,0))
union all
SELECT CAST(3.5708074820000013E17 AS DECIMAL(18,0))
SELECT CAST(3.57080748200000147*power(10,17) AS DECIMAL(18,0))
union all
SELECT CAST(3.570807482000001*power(10,17) AS DECIMAL(18,0))
union all
SELECT CAST(3.5708074820000013*power(10,17) AS DECIMAL(18,0))
Upvotes: 0
Views: 343
Reputation: 24458
I can see that you're left with 17 significant digits. I used to have this same problem working with Python.
In a nut shell, double precision floating point numbers are created out of what you provide to the programming language. They have 16-17 significant digits. So, even before cast
, your number is already changed. It's a normal thing, as people usually don't deal with such high precision numbers.
One way around it would be using strings instead of numbers. Then, after casting, you should get correct value. Also, when working with Python, there's a decimal
library dedicated to deal with decimal numbers.
Upvotes: 1