Reputation: 3432
I have a dataframse such as
Groups SP Letters
G1 SP1 A
G1 SP1 Z
G1 SP1 A
G1 SP2 X
G1 SP3 X
G1 SP4 B
G2 SP2 A
G2 SP5 B
G2 SP3 X
G2 SP23 A
and I would like to add a column (Nb_column
) which would be the count of unique SP
value within each Groups
but where Letters
are in a list the_list=['A','B']
here I should then get:
Groups SP Letters Nb_column
G1 SP1 A 2
G1 SP1 Z 2
G1 SP1 A 2
G1 SP2 X 2
G1 SP3 X 2
G1 SP4 B 2
G2 SP2 A 3
G2 SP5 B 3
G2 SP3 X 3
G2 SP23 A 3
Where
SP
(SP1,SP4) with a Letters
in the_list
for the G1.
AndSP
(SP2,SP5 and SP23) with a Letters
in the_list
for the G2.Does someone have an idea in pandas please ?
Upvotes: 3
Views: 102
Reputation: 35636
We can select from the SP
column based on where Letters
isin
the_list
using loc
then use groupby nunique
to get the corresponding number of unique values per group. Then rename
and join
back to the DataFrame to make the new column aligned with Groups
:
df = df.join(
df.loc[df['Letters'].isin(the_list), 'SP']
.groupby(df['Groups']).nunique()
.rename('Nb_column'),
on='Groups'
)
Alternatively to join
we can reindex
to scale and create the new column from that:
df['Nb_column'] = (
df.loc[df['Letters'].isin(the_list), 'SP']
.groupby(df['Groups']).transform('nunique')
.reindex(index=df.index, method='ffill')
)
df
:
Groups SP Letters Nb_column
0 G1 SP1 A 2
1 G1 SP1 Z 2
2 G1 SP1 A 2
3 G1 SP2 X 2
4 G1 SP3 X 2
5 G1 SP4 B 2
6 G2 SP2 A 3
7 G2 SP5 B 3
8 G2 SP3 X 3
9 G2 SP23 A 3
Upvotes: 4
Reputation: 260600
You can use groupby
+transform
+nunique
on a masked version of SP (by default the NaNs are not counted by nunique
):
df['Nb_columns'] = (df['SP'].where(df['Letters'].isin(the_list))
.groupby(df['Groups']).transform('nunique')
)
output:
Groups SP Letters Nb_columns
0 G1 SP1 A 2
1 G1 SP1 Z 2
2 G1 SP1 A 2
3 G1 SP2 X 2
4 G1 SP3 X 2
5 G1 SP4 B 2
6 G2 SP2 A 3
7 G2 SP5 B 3
8 G2 SP3 X 3
9 G2 SP23 A 3
Upvotes: 5
Reputation: 150735
Use isin
to check for the existence, then groupby().nunique()
and map back:
nb = df[df['Letters'].isin(lst)].groupby('Groups')['SP'].nunique()
df['Nb_column'] = df['Groups'].map(nb)
Upvotes: 4