Sayed Shazeb
Sayed Shazeb

Reputation: 75

Pyspark Dataframe pivot and groupby count

I am working on a pyspark dataframe which looks like below

id category
1 A
1 A
1 B
2 B
2 A
3 B
3 B
3 B

I want to unstack the category column and count their occurrences. So, the result I want is shown below

id A B
1 2 1
2 1 1
3 Null 3

I tried finding something on the internet that can help me but I couldn't find anything that could give me this specific result.

Upvotes: 7

Views: 5738

Answers (2)

Athi
Athi

Reputation: 391

Short version, dont have to do multiple groupBy's

df.groupBy("id").pivot("category").count().show()

Upvotes: 7

Rahul
Rahul

Reputation: 767

Try this -- (Not sure its optimized)

df = spark.createDataFrame([(1,'A'),(1,'A'),(1,'B'),(2,'B'),(2,'A'),(3,'B'),(3,'B'),(3,'B')],['id','category'])
df = df.groupBy('id','category').count()
df.groupBy('id').pivot('category').sum('count').show()

enter image description here

Upvotes: 4

Related Questions