Reputation: 47
I have a dataframe of title
and bin
:
+---------------------+-------------+
| Title| bin|
+---------------------+-------------+
| Forrest Gump (1994)| 3|
| Pulp Fiction (1994)| 2|
| Matrix, The (1999)| 3|
| Toy Story (1995)| 1|
| Fight Club (1999)| 3|
+---------------------+-------------+
How do I count the bin
into each individual column of a new dataframe using Pyspark? For instance:
+------------+------------+------------+
| count(bin1)| count(bin2)| count(bin3)|
+------------+------------+------------+
| 1| 1 | 3|
+------------+------------+------------+
Is this possible? Would someone please help me with this if you know how?
Upvotes: 1
Views: 411
Reputation: 32640
Group by bin
and count then pivot the column bin
and rename the columns of resulting dataframe if you want:
import pyspark.sql.functions as F
df1 = df.groupBy("bin").count().groupBy().pivot("bin").agg(F.first("count"))
df1 = df1.toDF(*[f"count_bin{c}" for c in df1.columns])
df1.show()
#+----------+----------+----------+
#|count_bin1|count_bin2|count_bin3|
#+----------+----------+----------+
#| 1| 1| 3|
#+----------+----------+----------+
Upvotes: 2