pandas Grouping based on two variables

Question

+---------+---------+-------+
| g_var_1 | g_var_2 | group |
+---------+---------+-------+
| A       | B       | 1     |
+---------+---------+-------+
| B       | A       | 1     |
+---------+---------+-------+
| C       | D       | 2     |
+---------+---------+-------+
| D       | C       | 2     |
+---------+---------+-------+
| E       | F       | 3     |
+---------+---------+-------+
| F       | E       | 3     |
+---------+---------+-------+
| G       | H       | 4     |
+---------+---------+-------+
| H       | G       | 4     |
+---------+---------+-------+

Using pandas: I am trying to create a "group" variable based on "g_var_1" and "g_var_2". As you can see from the above ASCII table, the logic is that the same combinations of "g_var_1" and g_var_2" are grouped together. So observations with (g_var_1 == "A" and g_var_2 == "B") would be in the same group as observations with (g_var_1 == "B" and g_var_2 == "A").

The dataset that I am working with has more than a thousand rows, so doing this manually is not an optimal solution for me.

Any help would be greatly appreciated. Thanks in advance!

BENY · Accepted Answer

First sort then use ngroup with groupby

l=['g_var_1','g_var_2']
pd.DataFrame(np.sort(df[l],1),columns=l).groupby(l).ngroup().add(1)
Out[340]: 
0    1
1    1
2    2
3    2
4    3
5    3
6    4
7    4
dtype: int64
df['group']=pd.DataFrame(np.sort(df[l],1),columns=l).groupby(l).ngroup().add(1)

.values

pandas Grouping based on two variables

Answers (2)

Related Questions