Piranha
Piranha

Reputation: 121

Round all columns in dataframe - two decimal place pyspark

I have this command for all columns in my dataframe to round to 2 decimal places:

data = data.withColumn("columnName1", func.round(data["columnName1"], 2))

I have no idea how to round all Dataframe by the one command (not every column separate). Could somebody help me, please? I don't want to have the same command 50times with different column name.

Upvotes: 3

Views: 23096

Answers (2)

Lamanus
Lamanus

Reputation: 13581

There is not a function or command for applying all functions to the columns but you can iterate.

+-----+-----+
| col1| col2|
+-----+-----+
|1.111|2.222|
+-----+-----+

df = spark.read.option("header","true").option("inferSchema","true").csv("test.csv")

for c in df.columns:
    df = df.withColumn(c, f.round(c, 2))
    
df.show()

+----+----+
|col1|col2|
+----+----+
|1.11|2.22|
+----+----+

Updated

from pyspark.sql import functions as f

df.select(*[f.round(c, 2).alias(c) for c in df.columns]) \
  .show()

+----+----+
|col1|col2|
+----+----+
|1.11|2.22|
+----+----+

Upvotes: 11

popilla20k
popilla20k

Reputation: 403

To avoid converting non-FP columns:

import pyspark.sql.functions as F
for c_name, c_type in df.dtypes:
    if c_type in ('double', 'float'):
        df = df.withColumn(c_name, F.round(c_name, 2))

Upvotes: 2

Related Questions