warped
warped

Reputation: 9482

Apply changes to column based on condition in pandas groupby

Apologies if I have overlooked a similar problem that was solved elsewhere. These are the posts that I looked into before asking this question:

Pandas DataFrame groupby based on condition

pandas groupby replace based on condition

Python pandas dataframe group by based on a condition

problem:

Given a dataframe

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'a': [1,2,2,3,3,4,5,5,6,6],
    'b': np.random.rand(10),
    'c': 10*[0],
})

that has a column a that holds identities, and a column b that has random values, I want to label replace values in column c.

If there is only one entry for a, no change should be applied.

If there are two entries for a I want to relabel the entry in c at which b is minimal:

    a   b           c
0   1   0.472015    0  # <-- only one entry for a => nothing changes
1   2   0.553018    2  # <-- b is minimal => c gets value 2
2   2   0.770302    0
3   3   0.992023    0
4   3   0.119448    2  # <-- b is minimal => c gets value 2

approach:

Combination of groupby, agg and .loc

g = df.groupby('a').agg({'b': [np.argmin, np.argmax]})
df.loc[g[g[('b', 'argmin')] != g[('b', 'argmax')]][('b', 'argmin')], 'c' ] = 2

This works, but seems quite roundabout.

question

is there a less clumsy way to go about this?

thanks

help is greatly appreciated!

edit:

Groups have either one or two members. If both members have the same b value, nothing should happen.

df = pd.DataFrame({
    'a': [1,2,2,3,3,4,5,5,6,6],
    'b': [1,1,2,1,2,1,1,1,2,1],
    'c': 10*[0],
})

df.loc[df['b'].eq(df.groupby('a')['b'].transform('min')),'c']=2

out:

    a   b   c
0   1   1   2
1   2   1   2
2   2   2   0
3   3   1   2
4   3   2   0
5   4   1   2
6   5   1   2
7   5   1   2
8   6   2   0
9   6   1   2

df.loc[df.index==df.groupby('a')['b'].transform('idxmin'),'c']=2

    a   b   c
0   1   1   2  # <-- a has only one member, so this shouldn't be changed
1   2   1   2
2   2   2   0
3   3   1   2
4   3   2   0
5   4   1   2
6   5   1   2
7   5   1   0
8   6   2   0
9   6   1   2

Upvotes: 1

Views: 1814

Answers (1)

anky
anky

Reputation: 75080

You can use groupby+transform with df.loc[]:

df.loc[df['b'].eq(df.groupby('a')['b'].transform('min')),'c']=2

Example:

df.loc[df['b'].eq(df.groupby('a')['b'].transform('min')),'c']=2
print(df)
   a         b  c
0  1  0.126956  2
1  2  0.249078  2
2  2  0.929619  0
3  3  0.013735  2
4  3  0.192781  0
5  4  0.268828  2
6  5  0.649238  2
7  5  0.767545  0
8  6  0.678478  2
9  6  0.815916  0

EDIT:

g = df.groupby('a')
cond = pd.Series(df.index==g['b'].transform('idxmin')) & g['a'].transform('count').gt(1)
df.loc[cond,'c'] = 2

print(df)

   a  b  c
0  1  1  0
1  2  1  2
2  2  2  0
3  3  1  2
4  3  2  0
5  4  1  0
6  5  1  2
7  5  1  0
8  6  2  0
9  6  1  2

Upvotes: 1

Related Questions