Reputation: 441
I am trying to groupby 2 columns with an OR condition to create a group categorical count where if a number is repeated in either groupid1 or groupid2 it will create a 3rd unique grouping number.
input df
ID GROUPID1 GROUPID2
F632 173 1315
F632 173 1315
F966 173 1519
F966 134 1519
F173 704 1437
F455 704 1201
F456 702 1202
F457 800 1500
F966 173 1519
F966 134 1519
desired result - where 173/134 or 1315/1519 create groupid3 '1'; 704 = 2, and 702 and 800 are unique.
ID GROUP1 GROUP2 GROUP3
F632 173 1315 1
F632 173 1315 1
F966 173 1519 1
F966 134 1519 1
F173 704 1437 2
F455 704 1201 2
F456 702 1202 3
F457 800 1500 4
F966 173 1519 1
F966 134 1519 1
Upvotes: 1
Views: 59
Reputation: 323386
Let us try duplicated
with or
s=(~(df.GROUPID1.duplicated()|df.GROUPID2.duplicated())).cumsum()
0 1
1 1
2 1
3 1
4 2
5 2
6 3
7 4
dtype: int64
df['GROUP3']=s
Upvotes: 2