Reputation: 27
I'm trying to multiply two columns in Spark. Both the columns are of type Double. The result of the multiplication between 26.0 and 0.001 is 0.026000000000000002 and not 0.0026. How do I resolve this?
>>> df.printSchema()
root
|-- age: double (nullable = true)
|-- name: string (nullable = true)
|-- mul: double (nullable = false)
>>> df.withColumn('res', df['age']*df['mul']).show()
+----+--------+-----+--------------------+
| age| name| mul| res|
+----+--------+-----+--------------------+
|25.0| Ankit|0.001| 0.025|
|22.0|Jalfaizy|0.001| 0.022|
|20.0| saurabh|0.001| 0.02|
|26.0| Bala|0.001|0.026000000000000002|
+----+--------+-----+--------------------+
Thanks
Upvotes: 0
Views: 2077
Reputation: 3419
These are floating point errors.
A simple 1.1-1.0
gives 0.10000000000000009
in Python (or in Pyspark).
You can find more information about them here or in this stackoverflow answer
Rounding off to appropriate decimal places seems to be the simple solution for this problem.
Upvotes: 0
Reputation: 191
Round to 4 decimals the column:
import pyspark.sql.functions as F
df = df.withColumn("res", F.round(F.col("res"), 4)
Upvotes: 1
Reputation: 427
Convert It to Float :
from pyspark.sql.functions import udf,explode
from pyspark.sql.types import StringType
table_schema = StructType([
StructField("value", FloatType(), True)])
df= spark.createDataFrame(
[
( 0.026000000000000002,)
],table_schema
)
df.show()
Upvotes: 0