Reputation: 83
i have dataframe dd1
colA colB Total
A A 12
A A 1
B B 45
B B 0
B B 5
C C 1
D D 12
and i want output like this dd2:
colA colB count Total
A A 2 13
B B 3 50
C C 1 1
D D 1 12
In count column the value is how many time it occurred and Total column contain sum of Total.
Upvotes: 1
Views: 2144
Reputation: 4089
Try this -
from pyspark.sql import functions as F
dd2 = dd1.groupBy('colA','colA').agg(F.count('colA').alias('count'),F.sum('Total').alias('Total'))
Upvotes: 1