Reputation: 51
I am new to Spark and need a help with transposing the below input dataframe into the desired output df
(Rows to Columns) using PySpark or Spark Sql.
Input Dataframe-
A B C D
1 2 3 4
10 11 12 13
......
........
Required Output (transposed) data
A 1
B 2
C 3
D 4
A 11
B 12
C 13
D 14
....
......
It is better if I can pivot the input data (columns) as per our requirement.
Upvotes: 5
Views: 3934
Reputation: 75080
You can make a generalized function like below (inspired from my previous answer here):
def stack_multiple(data,cols=None,output_columns=["col","values"]):
"""stacks multiple columns in a dataframe,
takes all columns by default unless passed a list of values"""
cols = data.columns if cols is None else cols
cols= [cols] if isinstance(cols,str) else cols
return data.selectExpr(f"""stack({len(cols)},{','.join(map(','.join,
(zip([f'"{i}"' for i in cols],[f"`{i}`" for i in cols]))))})
as ({','.join(output_columns)})""")
Sample Runs:
stack_multiple(df).show()
+---+------+
|col|values|
+---+------+
| A| 1|
| B| 2|
| C| 3|
| D| 4|
| A| 10|
| B| 11|
| C| 12|
| D| 13|
+---+------+
stack_multiple(df,['A','B'],output_columns=['A','B']).show()
+---+---+
| A| B|
+---+---+
| A| 1|
| B| 2|
| A| 10|
| B| 11|
+---+---+
Upvotes: 4
Reputation: 31470
Use stack
function in spark.
Example:
df.show()
#+---+---+---+---+
#| A| B| C| D|
#+---+---+---+---+
#| 1| 2| 3| 4|
#| 10| 11| 12| 13|
#+---+---+---+---+
from pyspark.sql.functions import *
df.selectExpr("stack(4,'A',A,'B',B,'C',C,'D',D) as (key,value)").show()
#+---+-----+
#|key|value|
#+---+-----+
#| A| 1|
#| B| 2|
#| C| 3|
#| D| 4|
#| A| 10|
#| B| 11|
#| C| 12|
#| D| 13|
#+---+-----+
Upvotes: 4
Reputation: 42342
df = spark.createDataFrame([[1,2,3,4],[10,11,12,13]]).toDF('A','B','C','D')
+---+---+---+---+
| A| B| C| D|
+---+---+---+---+
| 1| 2| 3| 4|
| 10| 11| 12| 13|
+---+---+---+---+
df.select(
F.explode(
F.create_map(
*reduce(lambda x, y: x+y,
[[F.lit(col), col] for col in df.columns]
)
)
)
)
+---+-----+
|key|value|
+---+-----+
| A| 1|
| B| 2|
| C| 3|
| D| 4|
| A| 10|
| B| 11|
| C| 12|
| D| 13|
+---+-----+
Upvotes: 3