Reputation: 10966
Given a dataframe like the following:
foo bar baz
a 4 x
b 3 x
c 5 y
b 5 y
a 7 z
I'd like to obtain this:
bar_a bar_b bar_c baz
4 null null x
null 3 null x
null null 5 y
null 5 null y
7 null null z
Is there some function build-in to Spark to achieve this?
Upvotes: 0
Views: 53
Reputation: 42422
You can do a pivot
:
val df2 = df.groupBy($"baz", monotonically_increasing_id().as("id"))
.pivot("foo")
.agg(first($"bar"))
.drop("id")
df2.show
+---+----+----+----+
|baz| a| b| c|
+---+----+----+----+
| y|null|null| 5|
| y|null| 5|null|
| x| 4|null|null|
| x|null| 3|null|
| z| 7|null|null|
+---+----+----+----+
If you want pretty column names:
val df3 = df2.toDF(df2.columns.map(c => if (df2.columns.drop(1).contains(c)) "bar_" + c else c):_*)
df3.show
+---+-----+-----+-----+
|baz|bar_a|bar_b|bar_c|
+---+-----+-----+-----+
| y| null| null| 5|
| y| null| 5| null|
| x| 4| null| null|
| x| null| 3| null|
| z| 7| null| null|
+---+-----+-----+-----+
Upvotes: 1