Yash
Yash

Reputation: 27

Column Multiplication in Spark

I'm trying to multiply two columns in Spark. Both the columns are of type Double. The result of the multiplication between 26.0 and 0.001 is 0.026000000000000002 and not 0.0026. How do I resolve this?

>>> df.printSchema()
root
 |-- age: double (nullable = true)
 |-- name: string (nullable = true)
 |-- mul: double (nullable = false)


>>> df.withColumn('res', df['age']*df['mul']).show()
+----+--------+-----+--------------------+
| age|    name|  mul|                 res|
+----+--------+-----+--------------------+
|25.0|   Ankit|0.001|               0.025|
|22.0|Jalfaizy|0.001|               0.022|
|20.0| saurabh|0.001|                0.02|
|26.0|    Bala|0.001|0.026000000000000002|
+----+--------+-----+--------------------+

Thanks

Upvotes: 0

Views: 2077

Answers (3)

Cena
Cena

Reputation: 3419

These are floating point errors. A simple 1.1-1.0 gives 0.10000000000000009 in Python (or in Pyspark).
You can find more information about them here or in this stackoverflow answer

Rounding off to appropriate decimal places seems to be the simple solution for this problem.

Upvotes: 0

JOSE DANIEL FERNANDEZ
JOSE DANIEL FERNANDEZ

Reputation: 191

Round to 4 decimals the column:

import pyspark.sql.functions as F
df = df.withColumn("res", F.round(F.col("res"), 4)

Upvotes: 1

Addy
Addy

Reputation: 427

Convert It to Float :

from pyspark.sql.functions import udf,explode
from pyspark.sql.types import StringType
table_schema = StructType([
     StructField("value", FloatType(), True)])
df= spark.createDataFrame(
    [
 ( 0.026000000000000002,)       
        ],table_schema
    )
df.show()

Upvotes: 0

Related Questions