I want to do addition of similar type of columns (total columns are more than 100) as follows: id b c d b_apac c_apac d_apac abcd 3 5 null 45 9 1 bcd 13 15 1 45 2 10 cd 32 null 6 45 90 1 resultant table should look like this: id b_sum c_sum d_sum abcd 48 14 1 bcd 58 17 11 cd 77 90 7 Please help me with some generic code as I have more than 100 columns to do this for. |

Reputation: 73

How to do similar type of columns addition in Pyspark?

I want to do addition of similar type of columns (total columns are more than 100) as follows:

id	b	c	d	b_apac	c_apac	d_apac
abcd	3	5	null	45	9	1
bcd	13	15	1	45	2	10
cd	32	null	6	45	90	1

resultant table should look like this:

id	b_sum	c_sum	d_sum
abcd	48	14	1
bcd	58	17	11
cd	77	90	7

Please help me with some generic code as I have more than 100 columns to do this for. |

Upvotes: 0

Answers (1)

Jonathan

Reputation: 2033

You can use use sum and check the prefix of your column name:

df.select(
    'id',
    sum([df[col] for col in df.columns if col.startswith('b')]).alias('b_sum'),
    sum([df[col] for col in df.columns if col.startswith('c')]).alias('c_sum'),
    sum([df[col] for col in df.columns if col.startswith('d')]).alias('d_sum'),
).show(10, False)

Upvotes: 1

How to do similar type of columns addition in Pyspark?

Answers (1)

Related Questions