Pyspark Dataframe pivot and groupby count

Question

I am working on a pyspark dataframe which looks like below

id	category
1	A
1	A
1	B
2	B
2	A
3	B
3	B
3	B

I want to unstack the category column and count their occurrences. So, the result I want is shown below

id	A	B
1	2	1
2	1	1
3	Null	3

I tried finding something on the internet that can help me but I couldn't find anything that could give me this specific result.

Rahul · Accepted Answer

Try this -- (Not sure its optimized)

df = spark.createDataFrame([(1,'A'),(1,'A'),(1,'B'),(2,'B'),(2,'A'),(3,'B'),(3,'B'),(3,'B')],['id','category'])
df = df.groupBy('id','category').count()
df.groupBy('id').pivot('category').sum('count').show()

Pyspark Dataframe pivot and groupby count

Answers (2)

Related Questions