Reputation: 25
I am trying to pivot a simple dataframe in pyspark and I must be missing something. I have a dataframe df in the form of:
+----+----+
|Item| Key|
+----+----+
| 1| A|
+----+----+
| 2| A|
+----+----+
I attempt to pivot it on Item such as
df.groupBy("Item").\
pivot("Item", ["1","2"]).\
agg(first("Key"))
and I receive:
+----+----+----+
|Item| 1| 2|
+----+----+----+
| 1| A|null|
+----+----+----+
| 2|null| A|
+----+----+----+
But what I want is:
+----+----+
| 1| 2|
+----+----+
| A| A|
+----+----+
How do I keep the Item column from remaining in my output pivot table which I assume messes up my result? I am running Spark 2.3.2 and Python 3.7.0
Upvotes: 0
Views: 1156
Reputation: 2200
Try without define aggregate column
>>> df.show()
+----+---+
|Item|Key|
+----+---+
| 1| A|
| 2| A|
+----+---+
>>> df.groupBy().pivot("Item", ["1","2"]).agg(first("Key")).show()
+---+---+
| 1| 2|
+---+---+
| A| A|
+---+---+
Upvotes: 1