chippycentra
chippycentra

Reputation: 3432

Add a new column with count depending on occurence within a list in pandas

I have a dataframse such as

Groups SP   Letters
G1     SP1  A
G1     SP1  Z
G1     SP1  A
G1     SP2  X
G1     SP3  X
G1     SP4  B
G2     SP2  A
G2     SP5  B
G2     SP3  X
G2     SP23 A

and I would like to add a column (Nb_column) which would be the count of unique SP value within each Groups but where Letters are in a list the_list=['A','B']

here I should then get:

Groups SP   Letters Nb_column
G1     SP1  A       2
G1     SP1  Z       2
G1     SP1  A       2
G1     SP2  X       2
G1     SP3  X       2
G1     SP4  B       2
G2     SP2  A       3
G2     SP5  B       3
G2     SP3  X       3
G2     SP23 A       3

Where

Does someone have an idea in pandas please ?

Upvotes: 3

Views: 102

Answers (3)

Henry Ecker
Henry Ecker

Reputation: 35636

We can select from the SP column based on where Letters isin the_list using loc then use groupby nunique to get the corresponding number of unique values per group. Then rename and join back to the DataFrame to make the new column aligned with Groups:

df = df.join(
    df.loc[df['Letters'].isin(the_list), 'SP']
        .groupby(df['Groups']).nunique()
        .rename('Nb_column'),
    on='Groups'
)

Alternatively to join we can reindex to scale and create the new column from that:

df['Nb_column'] = (
    df.loc[df['Letters'].isin(the_list), 'SP']
        .groupby(df['Groups']).transform('nunique')
        .reindex(index=df.index, method='ffill')
)

df:

  Groups    SP Letters  Nb_column
0     G1   SP1       A          2
1     G1   SP1       Z          2
2     G1   SP1       A          2
3     G1   SP2       X          2
4     G1   SP3       X          2
5     G1   SP4       B          2
6     G2   SP2       A          3
7     G2   SP5       B          3
8     G2   SP3       X          3
9     G2  SP23       A          3

Upvotes: 4

mozway
mozway

Reputation: 260600

You can use groupby+transform+nunique on a masked version of SP (by default the NaNs are not counted by nunique):

df['Nb_columns'] = (df['SP'].where(df['Letters'].isin(the_list))
                     .groupby(df['Groups']).transform('nunique')
                    )

output:

  Groups    SP Letters  Nb_columns
0     G1   SP1       A           2
1     G1   SP1       Z           2
2     G1   SP1       A           2
3     G1   SP2       X           2
4     G1   SP3       X           2
5     G1   SP4       B           2
6     G2   SP2       A           3
7     G2   SP5       B           3
8     G2   SP3       X           3
9     G2  SP23       A           3

Upvotes: 5

Quang Hoang
Quang Hoang

Reputation: 150735

Use isin to check for the existence, then groupby().nunique() and map back:

nb = df[df['Letters'].isin(lst)].groupby('Groups')['SP'].nunique()

df['Nb_column'] = df['Groups'].map(nb)

Upvotes: 4

Related Questions