Reputation: 27

Column Multiplication in Spark

I'm trying to multiply two columns in Spark. Both the columns are of type Double. The result of the multiplication between 26.0 and 0.001 is 0.026000000000000002 and not 0.0026. How do I resolve this?

>>> df.printSchema()
root
 |-- age: double (nullable = true)
 |-- name: string (nullable = true)
 |-- mul: double (nullable = false)


>>> df.withColumn('res', df['age']*df['mul']).show()
+----+--------+-----+--------------------+
| age|    name|  mul|                 res|
+----+--------+-----+--------------------+
|25.0|   Ankit|0.001|               0.025|
|22.0|Jalfaizy|0.001|               0.022|
|20.0| saurabh|0.001|                0.02|
|26.0|    Bala|0.001|0.026000000000000002|
+----+--------+-----+--------------------+

Thanks

Upvotes: 0

Answers (3)

Cena

Reputation: 3419

These are floating point errors. A simple 1.1-1.0 gives 0.10000000000000009 in Python (or in Pyspark).
You can find more information about them here or in this stackoverflow answer

Rounding off to appropriate decimal places seems to be the simple solution for this problem.

Upvotes: 0

JOSE DANIEL FERNANDEZ

Reputation: 191

Round to 4 decimals the column:

import pyspark.sql.functions as F
df = df.withColumn("res", F.round(F.col("res"), 4)

Upvotes: 1

Addy

Reputation: 427

Convert It to Float :

from pyspark.sql.functions import udf,explode
from pyspark.sql.types import StringType
table_schema = StructType([
     StructField("value", FloatType(), True)])
df= spark.createDataFrame(
    [
 ( 0.026000000000000002,)       
        ],table_schema
    )
df.show()

Upvotes: 0

Column Multiplication in Spark

Answers (3)

Related Questions