Reputation: 159
I have here an example DF:
+---+---------+----+---------+---------+-------------+-------------+
|id | company |type|rev2016 | rev2017 | main2016 | main2017 |
+---+---------+----+---------+---------+-------------+-------------+
| 1 | google |web | 100 | 200 | 55 | 66 |
+---+---------+----+---------+---------+-------------+-------------+
And I want this output:
+---+---------+----+-------------+------+------+
|id | company |type| Metric | 2016 | 2017 |
+---+---------+----+-------------+------+------+
| 1 | google |web | rev | 100 | 200 |
| 1 | google |web | main | 55 | 66 |
+---+---------+----+-------------+------+------+
What I am trying to achieve is transposing revenue and maintenance columns to rows with new column 'Metric'. I am trying pivoting no luck so far.
Upvotes: 1
Views: 583
Reputation: 42332
You can construct an array of structs from the columns, and then explode the arrays and expand the structs to get the desired output.
import pyspark.sql.functions as F
struct_list = [
F.struct(
F.lit('rev').alias('Metric'),
F.col('rev2016').alias('2016'),
F.col('rev2017').alias('2017')
),
F.struct(
F.lit('main').alias('Metric'),
F.col('main2016').alias('2016'),
F.col('main2017').alias('2017')
)
]
df2 = df.withColumn(
'arr',
F.explode(F.array(*struct_list))
).select('id', 'company', 'type', 'arr.*')
df2.show()
+---+-------+----+------+----+----+
| id|company|type|Metric|2016|2017|
+---+-------+----+------+----+----+
| 1| google| web| rev| 100| 200|
| 1| google| web| main| 55| 66|
+---+-------+----+------+----+----+
Or you can use stack
:
df2 = df.selectExpr(
'id', 'company', 'type',
"stack(2, 'rev', rev2016, rev2017, 'main', main2016, main2017) as (Metric, `2016`, `2017`)"
)
df2.show()
+---+-------+----+------+----+----+
| id|company|type|Metric|2016|2017|
+---+-------+----+------+----+----+
| 1| google| web| rev| 100| 200|
| 1| google| web| main| 55| 66|
+---+-------+----+------+----+----+
Upvotes: 1