Reputation: 89
What's a good way of unpivoting a dataframe without having to hard-code the column names, I've 50+ columns that need to switch to rows. I've tried the following, given this example:
Using Python 2.7
df:
+----------+-------+-------+-------+-------+-------+
|Category |Col_A |Col_B |Col_C |Col_D |.......|
+----------+-------+-------+-------+-------+-------+
|Category1 |1 |0 |3 |0 |.......|
+----------+-------+-------+-------+-------+-------+
|Category2 |5 |0 |2 |0 |.......|
+----------+-------+-------+-------+-------+-------+
|Category3 |0 |6 |0 |4 |.......|
+----------+-------+-------+-------+-------+-------+
Code:
Determine the column (Col_X) names:
cols = df.columns[1:]
Gets the column count and creates the format 'Col_X', Col_X for each:
x = ""
cnt = 0
for c in cols:
cnt += 1
x += "'"+str(c)+"', " + str(c) + ", "
Removes last two characters:
x = x[:-2]
Creates the full expression:
xpr = "\"stack(" + str(cnt) + ", " + str(x) + ") as (Type,Value)\""
Output:
df_output = df.select("Category",expr(xpr))
But it returns something like this:
df_output.show()
+----------+-------+-------+-------+-------+-------+
|Category |stack(55, 'Col_A', Col_A, 'Col_B', Col_B....
+----------+-------+-------+-------+-------+-------+
|Category1 |
+----------+
|Category2 |
+----------+
|Category3 |
+----------+
Upvotes: 1
Views: 2122
Reputation: 89
I managed to make it work by applying this on the expression buildup:
xpr = """stack({}, {}) as (Type,Value)""".format(cnt,x)
Upvotes: 1