user3820901
user3820901

Reputation: 25

PySpark: Pivot column into single row

I am trying to pivot a simple dataframe in pyspark and I must be missing something. I have a dataframe df in the form of:

+----+----+
|Item| Key|
+----+----+
|   1|   A|
+----+----+
|   2|   A|
+----+----+

I attempt to pivot it on Item such as

df.groupBy("Item").\
        pivot("Item", ["1","2"]).\
        agg(first("Key"))

and I receive:

+----+----+----+
|Item|   1|   2|
+----+----+----+
|   1|   A|null|
+----+----+----+
|   2|null|   A|
+----+----+----+

But what I want is:

+----+----+
|   1|   2|
+----+----+
|   A|   A|
+----+----+

How do I keep the Item column from remaining in my output pivot table which I assume messes up my result? I am running Spark 2.3.2 and Python 3.7.0

Upvotes: 0

Views: 1156

Answers (1)

Ali Yesilli
Ali Yesilli

Reputation: 2200

Try without define aggregate column

>>> df.show()
+----+---+
|Item|Key|
+----+---+
|   1|  A|
|   2|  A|
+----+---+

>>> df.groupBy().pivot("Item", ["1","2"]).agg(first("Key")).show()
+---+---+
|  1|  2|
+---+---+
|  A|  A|
+---+---+

Upvotes: 1

Related Questions