Pandas - Aggregating several columns into one

Question

I have a dataframe with several categorical columns, and I want to aggregate all these into a single categorical column, preferably using Pandas.

For an example, if I have two columns, named category1 (c1) and category2 (c2), both with data that range from 0 to 2, I want to aggregate them in some other column category (c), which can range from 0 to 5, representing all the possible categorical values combinations.

I would go from this:

d1 d2 c1 c2
1  1  NA 0
2  1  1  1
3  1  0  2
4  2  2  NA
5  1  NA NA
6  2  2  2
7  2  0  NA
8  2  0  2

To this:

I tried following this, but it didn't seem to work and threw some errors, namely ValueError: cannot reindex from a duplicate axis.

I appreciate in advance any help.

rafaelc · Accepted Answer

IIUC, you can use ngroup with groupby.

df['c'] = df.fillna(-1).groupby(['c1', 'c2']).ngroup()

The order might be arbitrary (i.e. not same as yours), but hopefully that's not important.

   d1  d2  c
0   1   1  1
1   2   1  4
2   3   1  3
3   4   2  5
4   5   1  0
5   6   2  6
6   7   2  2
7   8   2  3

Pandas - Aggregating several columns into one

Answers (1)

Related Questions