rebar
rebar

Reputation: 105

Replace value based on condition within groups in a dataframe

My dataframe is something like

df

group cat_col
g1    r
g1    nr
g1    r
g1    nr
g2    nr
g2    nr

I need to replace "nr" for "r" whenever the group has at least 1 "r". In this case, I need it to return:

df_new

group cat_col
g1    r
g1    r
g1    r
g1    r
g2    nr
g2    nr

I know this question is elementary, but I'm stuck for hours and I didn't figure out how to solve it. Does someone know?

Upvotes: 3

Views: 1097

Answers (4)

Henry Ecker
Henry Ecker

Reputation: 35686

We can also use groupby transform to see if there are any values in each group that eq r and use this Boolean index to then replace those values with r:

m = df['cat_col'].eq('r').groupby(df['group']).transform('any')
df.loc[m, 'cat_col'] = 'r'

Or conditionally replace nr with r using the same boolean index (in case there are multiple replace values):

m = df['cat_col'].eq('r').groupby(df['group']).transform('any')
df.loc[m, 'cat_col'] = df.loc[m, 'cat_col'].replace({'nr': 'r'})

df:

  group cat_col
0    g1       r
1    g1       r
2    g1       r
3    g1       r
4    g2      nr
5    g2      nr

Boolean index steps in a DataFrame:

steps_df = pd.DataFrame({
    # Find where cat_col is r
    'step 1': df['cat_col'].eq('r'),
    # Find groups which have an r value
    'step 2': df['cat_col'].eq('r').groupby(df['group']).transform('any')
})
   step 1  step 2
0    True    True
1   False    True
2    True    True
3   False    True
4   False   False
5   False   False

Setup (DataFrame and imports):

import pandas as pd

df = pd.DataFrame({
    'group': ['g1', 'g1', 'g1', 'g1', 'g2', 'g2'],
    'cat_col': ['r', 'nr', 'r', 'nr', 'nr', 'nr']
})

Upvotes: 1

akuiper
akuiper

Reputation: 215137

Use groupby.transform:

df.cat_col.groupby(df.group).transform(lambda g: 'r' if g.eq('r').any() else g)

0     r
1     r
2     r
3     r
4    nr
5    nr
Name: cat_col, dtype: object

If only need to replace nr with r:

df.cat_col = df.cat_col.groupby(df.group).transform(
  lambda g: g.replace('nr', 'r') if g.eq('r').any() else g
)

Upvotes: 3

osint_alex
osint_alex

Reputation: 1022

This has a few more steps but I think is quite clear to follow:

groups_that_pass_the_condition = []
groups = df.group.unique()

for group in groups:
    cat_col_by_group = df.loc[df.group == group]['cat_col']
    value_counts = cat_col_by_group.value_counts()
    if 'r' in value_counts.index:
        if value_counts.r >= 1:
            groups_that_pass_the_condition.append(group)

for group_that_passed in groups_that_pass_the_condition:
    df.loc[df.group == group_that_passed] = df.loc[df.group == group_that_passed].replace('nr', 'r')

print(df)

OUT:

  group cat_col
0    g1       r
1    g1       r
2    g1       r
3    g1       r
4    g2      nr
5    g2      nr

Upvotes: 0

IoaTzimas
IoaTzimas

Reputation: 10624

Here is one way to do it:

l=df[df['cat_col']=='r']['group'].to_list()

df.loc[df['group'].isin(l), 'cat_col'] = df.loc[df['group'].isin(gr), 'cat_col'].replace('nr', 'r')

Output:

>>> print(df)
  group cat_col
0    g1       r
1    g1       r
2    g1       r
3    g1       r
4    g2      nr
5    g2      nr

Upvotes: 0

Related Questions