Reputation: 8967
Is there any function available for adding multiple Integer Column values and create a new column.
For example: Multiple counts to single total count column.
I hope concat will work only for String columns.
Upvotes: 2
Views: 821
Reputation: 25435
There are two easy ways of doing that. The first is simply using +
and typing the column names out, the other is using a combination of add
and reduce
to sum many columns at once.
Below is an example, where two ways are shown to take the sum of all columns that have an x
in their name (so we do not include column y1
in out total).
Hope this helps!
import pyspark.sql.functions as F
import pandas as pd
# SAMPLE DATA -----------------------------------------------------------------------
df = pd.DataFrame({'x1': [0,0,0,1,1],
'x2': [6,5,4,3,2],
'x3': [2,2,2,2,2],
'y1': [1,1,1,1,1]})
df = spark.createDataFrame(df)
# Sum by typing the column names explicitly
df = df.withColumn('total_1',F.col('x1') + F.col('x2') + F.col('x3'))
# Sum many columns without typing them out using reduce
import operator
import functools
cols_to_sum = [col for col in df.columns if 'x' in col]
df = df.withColumn('total_2',functools.reduce(operator.add, [F.col(x) for x in cols_to_sum]))
df.show()
Output:
+---+---+---+---+-------+-------+
| x1| x2| x3| y1|total_1|total_2|
+---+---+---+---+-------+-------+
| 0| 6| 2| 1| 8| 8|
| 0| 5| 2| 1| 7| 7|
| 0| 4| 2| 1| 6| 6|
| 1| 3| 2| 1| 6| 6|
| 1| 2| 2| 1| 5| 5|
+---+---+---+---+-------+-------+
Upvotes: 1