Reputation: 3432
Hello I have a dataframe such as:
COL1_1 COL1_3 COL2
Chr1_0 Canis_lupus A
Chr1_0 Canis_lupus A
Chr1_0 Canis_lupus B
Chr1_0 Canis_lupus B
Chr1_0 Canis_lupus B
Chr1_0 Felis_cattus B
Chr1_0 Felis_cattus B
Chr2_0 Felis_cattus A
Chr2_0 Felis_cattus B
Chr2_1 Felis_cattus C
Chr2_1 Felis_cattus D
Chr2_1 Felis_cattus E
and the idea is within each COL1_1
and COL1_3
count the number of distinct COL2
.
ex : for Chr1_0
and Canis_lupus
there are 2 distinct COL2
(A and B), so I put 2 into the new COL3
.
if there is only one value, I put a 0.
here I should then get
COL1_1 COL1_3 COL2 COL3
Chr1_0 Canis_lupus A 2
Chr1_0 Canis_lupus A 2
Chr1_0 Canis_lupus B 2
Chr1_0 Canis_lupus B 2
Chr1_0 Canis_lupus B 2
Chr1_0 Felis_cattus B 0
Chr1_0 Felis_cattus B 0
Chr2_0 Felis_cattus A 2
Chr2_0 Felis_cattus B 2
Chr2_1 Felis_cattus C 3
Chr2_1 Felis_cattus D 3
Chr2_1 Felis_cattus E 3
maybe an idea would be to groupby (COL1_1and
COL1_3`) and count number of distinct COL2 values.
Upvotes: 1
Views: 647
Reputation: 862761
Use GroupBy.transform
with DataFrameGroupBy.nunique
and Series.mask
for replace 1
to 0
:
df['COL3'] = (df.groupby(['COL1_1', 'COL1_3']).COL2.transform('nunique')
.mask(lambda x: x == 1, 0))
Or use replace
:
df['COL3'] = df.groupby(['COL1_1', 'COL1_3']).COL2.transform('nunique').replace({1:0})
print (df)
COL1_1 COL1_3 COL2 COL3
0 Chr1_0 Canis_lupus A 2
1 Chr1_0 Canis_lupus A 2
2 Chr1_0 Canis_lupus B 2
3 Chr1_0 Canis_lupus B 2
4 Chr1_0 Canis_lupus B 2
5 Chr1_0 Felis_cattus B 0
6 Chr1_0 Felis_cattus B 0
7 Chr2_0 Felis_cattus A 2
8 Chr2_0 Felis_cattus B 2
9 Chr2_1 Felis_cattus C 3
10 Chr2_1 Felis_cattus D 3
11 Chr2_1 Felis_cattus E 3
Upvotes: 3