Prabhu
Prabhu

Reputation: 5466

Pandas non-numeric categorical dummy columns to single categorical column

I have data like following

test=pd.DataFrame( {'group':['v','w','x','y','z'],
                       'cat1':['a',np.nan,np.nan,'c',np.nan],
                       'cat2':[np.nan,'b','a',np.nan,np.nan],
                       'cat3':[np.nan,np.nan,np.nan,np.nan,'a'],
                      })

I wanted to make like following

| group | cat1 | cat2 | cat3 |
|-------|------|------|------|
|   v   |  a   | NaN  | NaN  |
|   w   | NaN  |  b   | NaN  |
|   x   | NaN  |  a   | NaN  |
|   y   |  c   | NaN  | NaN  |
|   z   | NaN  | NaN  |  a   |

to

| group | category | values |
|-------|----------|--------|
|   v   |   cat1   |   a    |
|   w   |   cat2   |   b    |
|   x   |   cat2   |   a    |
|   y   |   cat1   |   c    |
|   z   |   cat3   |   a    |

I tried using idxmax from another question in stackoverflow but that didn't do the trick.

Upvotes: 1

Views: 198

Answers (2)

cs95
cs95

Reputation: 402263

You can set_index to convert "group" to the index, then stack to remove NaNs, then you're done:

test.set_index('group').stack()

group      
v      cat1    a
w      cat2    b
x      cat2    a
y      cat1    c
z      cat3    a
dtype: object

(test.set_index('group')
     .stack()
     .reset_index(name='values')
     .rename(columns={'level_1': 'categories'}))

  group categories values
0     v       cat1      a
1     w       cat2      b
2     x       cat2      a
3     y       cat1      c
4     z       cat3      a

NOTE: Does NOT sort the rows


Another option is using pd.wide_to_long, produces similar output:

(pd.wide_to_long(test, ['cat'], i=['group'], j='category')
   .dropna()
   .reset_index()
   .rename(columns={'cat': 'values'}))

  group  category values
0     v         1      a
1     y         1      c
2     w         2      b
3     x         2      a
4     z         3      a

NOTE: Rows will be sorted by "category"

Upvotes: 3

sammywemmy
sammywemmy

Reputation: 28659

You can use melt, drop the null values and sort on the variable column

test.melt("group", var_name="category").dropna().sort_values("variable", ignore_index=True)


    group   variable    value
0     v     cat1         a
1     y     cat1         c
2     w     cat2         b
3     x     cat2         a
4     z     cat3         a

Upvotes: 2

Related Questions