suraj jadhav
suraj jadhav

Reputation: 73

How to do similar type of columns addition in Pyspark?

I want to do addition of similar type of columns (total columns are more than 100) as follows:

id b c d b_apac c_apac d_apac
abcd 3 5 null 45 9 1
bcd 13 15 1 45 2 10
cd 32 null 6 45 90 1

resultant table should look like this:

id b_sum c_sum d_sum
abcd 48 14 1
bcd 58 17 11
cd 77 90 7

Please help me with some generic code as I have more than 100 columns to do this for. |

Upvotes: 0

Views: 58

Answers (1)

Jonathan
Jonathan

Reputation: 2033

You can use use sum and check the prefix of your column name:

df.select(
    'id',
    sum([df[col] for col in df.columns if col.startswith('b')]).alias('b_sum'),
    sum([df[col] for col in df.columns if col.startswith('c')]).alias('c_sum'),
    sum([df[col] for col in df.columns if col.startswith('d')]).alias('d_sum'),
).show(10, False)

Upvotes: 1

Related Questions