Reputation: 121
I have this command for all columns in my dataframe to round to 2 decimal places:
data = data.withColumn("columnName1", func.round(data["columnName1"], 2))
I have no idea how to round all Dataframe by the one command (not every column separate). Could somebody help me, please? I don't want to have the same command 50times with different column name.
Upvotes: 3
Views: 23096
Reputation: 13581
There is not a function or command for applying all functions to the columns but you can iterate.
+-----+-----+
| col1| col2|
+-----+-----+
|1.111|2.222|
+-----+-----+
df = spark.read.option("header","true").option("inferSchema","true").csv("test.csv")
for c in df.columns:
df = df.withColumn(c, f.round(c, 2))
df.show()
+----+----+
|col1|col2|
+----+----+
|1.11|2.22|
+----+----+
Updated
from pyspark.sql import functions as f
df.select(*[f.round(c, 2).alias(c) for c in df.columns]) \
.show()
+----+----+
|col1|col2|
+----+----+
|1.11|2.22|
+----+----+
Upvotes: 11
Reputation: 403
To avoid converting non-FP columns:
import pyspark.sql.functions as F
for c_name, c_type in df.dtypes:
if c_type in ('double', 'float'):
df = df.withColumn(c_name, F.round(c_name, 2))
Upvotes: 2