Reputation: 73
I want to do addition of similar type of columns (total columns are more than 100) as follows:
id | b | c | d | b_apac | c_apac | d_apac |
---|---|---|---|---|---|---|
abcd | 3 | 5 | null | 45 | 9 | 1 |
bcd | 13 | 15 | 1 | 45 | 2 | 10 |
cd | 32 | null | 6 | 45 | 90 | 1 |
resultant table should look like this:
id | b_sum | c_sum | d_sum |
---|---|---|---|
abcd | 48 | 14 | 1 |
bcd | 58 | 17 | 11 |
cd | 77 | 90 | 7 |
Please help me with some generic code as I have more than 100 columns to do this for. |
Upvotes: 0
Views: 58
Reputation: 2033
You can use use sum
and check the prefix of your column name:
df.select(
'id',
sum([df[col] for col in df.columns if col.startswith('b')]).alias('b_sum'),
sum([df[col] for col in df.columns if col.startswith('c')]).alias('c_sum'),
sum([df[col] for col in df.columns if col.startswith('d')]).alias('d_sum'),
).show(10, False)
Upvotes: 1