Applying map function on dataframe's columns

Question

I need to merge all the values of the dataframe's columns into a single value for each column. So the columns stay intact but I am just summing all the respective values. For this purpose I intend to utilize this function:

def sum_col(data, col):
    return data.select(f.sum(col)).collect()[0][0]

I was now thinking to do sth like this:

data = data.map(lambda current_col: sum_col(data, current_col))

Is this doable, or I need another way to merge all the values of the columns?

Shubham Jain · Accepted Answer

You can achieve this by sum function

import pyspark.sql.functions as f
df.select(*[f.sum(cols).alias(cols) for cols in df.columns]).show()

+----+---+---+
|val1|  x|  y|
+----+---+---+
|  36| 29|159|
+----+---+---+

Applying map function on dataframe's columns

Answers (2)

Related Questions

Applying map function on dataframe&#39;s columns

Answers (2)

Related Questions

Applying map function on dataframe's columns