Shankar
Shankar

Reputation: 8967

How to add multiple Integer column values and create a single column

Is there any function available for adding multiple Integer Column values and create a new column.

For example: Multiple counts to single total count column.

I hope concat will work only for String columns.

Upvotes: 2

Views: 821

Answers (1)

Florian
Florian

Reputation: 25435

There are two easy ways of doing that. The first is simply using + and typing the column names out, the other is using a combination of add and reduce to sum many columns at once.

Below is an example, where two ways are shown to take the sum of all columns that have an x in their name (so we do not include column y1 in out total).

Hope this helps!

import pyspark.sql.functions as F
import pandas as pd

# SAMPLE DATA -----------------------------------------------------------------------
df = pd.DataFrame({'x1': [0,0,0,1,1],
                   'x2': [6,5,4,3,2],
                   'x3': [2,2,2,2,2],
                   'y1': [1,1,1,1,1]})
df = spark.createDataFrame(df)

# Sum by typing the column names explicitly
df = df.withColumn('total_1',F.col('x1') + F.col('x2') + F.col('x3'))

# Sum many columns without typing them out using reduce
import operator
import functools
cols_to_sum = [col for col in df.columns if 'x' in col] 
df = df.withColumn('total_2',functools.reduce(operator.add, [F.col(x) for x in cols_to_sum]))

df.show()

Output:

+---+---+---+---+-------+-------+
| x1| x2| x3| y1|total_1|total_2|
+---+---+---+---+-------+-------+
|  0|  6|  2|  1|      8|      8|
|  0|  5|  2|  1|      7|      7|
|  0|  4|  2|  1|      6|      6|
|  1|  3|  2|  1|      6|      6|
|  1|  2|  2|  1|      5|      5|
+---+---+---+---+-------+-------+

Upvotes: 1

Related Questions