Reputation: 5466
I have data like following
test=pd.DataFrame( {'group':['v','w','x','y','z'],
'cat1':['a',np.nan,np.nan,'c',np.nan],
'cat2':[np.nan,'b','a',np.nan,np.nan],
'cat3':[np.nan,np.nan,np.nan,np.nan,'a'],
})
I wanted to make like following
| group | cat1 | cat2 | cat3 |
|-------|------|------|------|
| v | a | NaN | NaN |
| w | NaN | b | NaN |
| x | NaN | a | NaN |
| y | c | NaN | NaN |
| z | NaN | NaN | a |
to
| group | category | values |
|-------|----------|--------|
| v | cat1 | a |
| w | cat2 | b |
| x | cat2 | a |
| y | cat1 | c |
| z | cat3 | a |
I tried using idxmax
from another question in stackoverflow but that didn't do the trick.
Upvotes: 1
Views: 198
Reputation: 402263
You can set_index
to convert "group" to the index, then stack
to remove NaNs, then you're done:
test.set_index('group').stack()
group
v cat1 a
w cat2 b
x cat2 a
y cat1 c
z cat3 a
dtype: object
(test.set_index('group')
.stack()
.reset_index(name='values')
.rename(columns={'level_1': 'categories'}))
group categories values
0 v cat1 a
1 w cat2 b
2 x cat2 a
3 y cat1 c
4 z cat3 a
NOTE: Does NOT sort the rows
Another option is using pd.wide_to_long
, produces similar output:
(pd.wide_to_long(test, ['cat'], i=['group'], j='category')
.dropna()
.reset_index()
.rename(columns={'cat': 'values'}))
group category values
0 v 1 a
1 y 1 c
2 w 2 b
3 x 2 a
4 z 3 a
NOTE: Rows will be sorted by "category"
Upvotes: 3
Reputation: 28659
You can use melt, drop the null values and sort on the variable
column
test.melt("group", var_name="category").dropna().sort_values("variable", ignore_index=True)
group variable value
0 v cat1 a
1 y cat1 c
2 w cat2 b
3 x cat2 a
4 z cat3 a
Upvotes: 2