Reputation: 79
I am looking for a way to pivot around 600 columns into rows. Here's a sample with only 4 of those columns (good, bad, ok, Horrible):
df:
RecordID good bad ok Horrible
A 0 0 1 0
B 1 0 0 1
desired output:
RecordID Column Value
A Good 0
A Bad 0
A Ok 1
A Horrible 0
B Good 1
B Bad 0
B Ok 0
B Horrible 1
Upvotes: 0
Views: 205
Reputation: 2132
Adding dataframe:
import pandas as pd
import numpy as np
data2 = {'RecordID': ['a', 'b', 'c'],
'good': [0, 1, 1],
'bad': [0, 0, 1],
'horrible': [0, 1, 1],
'ok': [1, 0, 0]}
# Convert the dictionary into DataFrame
df = pd.DataFrame(data2)
Melt data: https://pandas.pydata.org/docs/reference/api/pandas.melt.html
melted = df.melt(id_vars='RecordID', var_name='Column', value_name='Value')
melted
Optionally: Group By - for summ or mean values:
f2 = melted.groupby(['Column']).sum()
df2
Upvotes: 1
Reputation: 23217
You can use .stack()
as follows. Using .stack()
is preferred as it naturally resulted in rows already sorted in the order of RecordID
so that you don't need to waste processing time sorting on it again, especially important when you have a large number of columns.
df = df.set_index('RecordID').stack().reset_index().rename(columns={'level_1': 'Column', 0: 'Value'})
Output:
RecordID Column Value
0 A good 0
1 A bad 0
2 A ok 1
3 A Horrible 0
4 B good 1
5 B bad 0
6 B ok 0
7 B Horrible 1
Upvotes: 1
Reputation: 1875
You can use melt function:
(df.melt(id_vars='RecordID', var_name='Column', value_name='Value')
.sort_values('RecordID')
.reset_index(drop=True)
)
Output:
RecordID Column Value
0 A good 0
1 A bad 0
2 A ok 1
3 A Horrible 0
4 B good 1
5 B bad 0
6 B ok 0
7 B Horrible 1
Upvotes: 1