Reputation: 89
I am trying to pivot a spark Dataframe with columns which have a foreign key to another table. All such column name start with FK_<column_name>
. The number of such columns can be 1 or more.
I want to be able to pivot all those columns where column name starts with FK_
into rows in one column so I can join with the other table. I don't need the column Names in another column but if the pivot operation does it that is fine as well.
Example table I have
id name dept FK_column1 FK_column2 FK_Column3
1 Alpha ABC 101 102 103
2 Bravo CDE 104 105 106
output I am looking for
id name dept foreign_keys
1 Alpha ABC 101
1 Alpha ABC 102
1 Alpha ABC 103
2 Bravo CDE 104
2 Bravo CDE 105
2 Bravo CDE 106
Upvotes: 1
Views: 724
Reputation: 32710
You can get the list of columns whose name starts with FK_
and build stack
expression to unpivot the dataframe:
fk_cols = [c for c in df.columns if c.startswith("FK_")]
stack_expr = f"stack({len(fk_cols)}," + ','.join(
[f"'{c.replace('FK_', '')}',{c}" for c in fk_cols]
) + ") as (FK, foreign_keys)"
df.selectExpr("id", "name", "dept", stack_expr).show()
#+---+-----+----+-------+------------+
#| id| name|dept| FK|foreign_keys|
#+---+-----+----+-------+------------+
#| 1|Alpha| ABC|column1| 101|
#| 1|Alpha| ABC|column2| 102|
#| 1|Alpha| ABC|Column3| 103|
#| 2|Bravo| CDE|column1| 104|
#| 2|Bravo| CDE|column2| 105|
#| 2|Bravo| CDE|Column3| 106|
#+---+-----+----+-------+------------+
Upvotes: 3