Reputation: 113
I have DF test contains below columns
Type Name Country Year Value
1 Rec US 2018 8
2 fg UK 2019 2
5 vd India 2020 1
7 se US 2021 3
I wanted to make pivot on it I have tried below expression
pivotdata=spark.sql("select * from test").groupby("Country").pivot("Year").sum("Value").show()
I am getting output But it is showing me only few columns except remaining two
Country 2018 2019 2020 2021
US - -
UK - -
India - -
US - -
So what can we do if I want all columns
Upvotes: 1
Views: 633
Reputation: 1712
If I am correctly understanding what you need, you have to provide your other columns also into the sum(). Consider the example below:
tst=sqlContext.createDataFrame([('2020-04-23',1,2,"india"),('2020-04-24',1,3,"india"),('2020-04-23',1,4,"china"),('2020-04-24',1,5,"china"),('2020-04-23',1,7,"germany"),('2020-04-24',1,9,"germany")],schema=('date','quantity','value','country'))
tst.show()
+----------+--------+-----+-------+
| date|quantity|value|country|
+----------+--------+-----+-------+
|2020-04-23| 1| 2| india|
|2020-04-24| 1| 3| india|
|2020-04-23| 1| 4| china|
|2020-04-24| 1| 5| china|
|2020-04-23| 1| 7|germany|
|2020-04-24| 1| 9|germany|
+----------+--------+-----+-------+
df_pivot=tst.groupby('country').pivot('date').sum('quantity','value').show()
df_pivot.show()
+-------+------------------------+---------------------+------------------------+---------------------+
|country|2020-04-23_sum(quantity)|2020-04-23_sum(value)|2020-04-24_sum(quantity)|2020-04-24_sum(value)|
+-------+------------------------+---------------------+------------------------+---------------------+
|germany| 1| 7| 1| 9|
| china| 1| 4| 1| 5|
| india| 1| 2| 1| 3|
+-------+------------------------+---------------------+------------------------+---------------------+
If you don't like the funny column names, then you can use the agg function to define your own suffix to the pivoted column name.
tst_res=tst.groupby('country').pivot('date').agg(F.sum('quantity').alias('sum_quantity'),F.sum('value').alias('sum_value'))
tst_res.show()
+-------+-----------------------+--------------------+-----------------------+--------------------+
|country|2020-04-23_sum_quantity|2020-04-23_sum_value|2020-04-24_sum_quantity|2020-04-24_sum_value|
+-------+-----------------------+--------------------+-----------------------+--------------------+
|germany| 1| 7| 1| 9|
| china| 1| 4| 1| 5|
| india| 1| 2| 1| 3|
+-------+-----------------------+--------------------+-----------------------+--------------------+
Upvotes: 2