Reputation: 105
My dataframe is something like
df
group cat_col
g1 r
g1 nr
g1 r
g1 nr
g2 nr
g2 nr
I need to replace "nr" for "r" whenever the group has at least 1 "r". In this case, I need it to return:
df_new
group cat_col
g1 r
g1 r
g1 r
g1 r
g2 nr
g2 nr
I know this question is elementary, but I'm stuck for hours and I didn't figure out how to solve it. Does someone know?
Upvotes: 3
Views: 1097
Reputation: 35686
We can also use groupby transform
to see if there are any
values in each group that eq
r
and use this Boolean index to then replace those values with r
:
m = df['cat_col'].eq('r').groupby(df['group']).transform('any')
df.loc[m, 'cat_col'] = 'r'
Or conditionally replace
nr
with r
using the same boolean index (in case there are multiple replace values):
m = df['cat_col'].eq('r').groupby(df['group']).transform('any')
df.loc[m, 'cat_col'] = df.loc[m, 'cat_col'].replace({'nr': 'r'})
df
:
group cat_col
0 g1 r
1 g1 r
2 g1 r
3 g1 r
4 g2 nr
5 g2 nr
Boolean index steps in a DataFrame:
steps_df = pd.DataFrame({
# Find where cat_col is r
'step 1': df['cat_col'].eq('r'),
# Find groups which have an r value
'step 2': df['cat_col'].eq('r').groupby(df['group']).transform('any')
})
step 1 step 2
0 True True
1 False True
2 True True
3 False True
4 False False
5 False False
Setup (DataFrame and imports):
import pandas as pd
df = pd.DataFrame({
'group': ['g1', 'g1', 'g1', 'g1', 'g2', 'g2'],
'cat_col': ['r', 'nr', 'r', 'nr', 'nr', 'nr']
})
Upvotes: 1
Reputation: 215137
Use groupby.transform
:
df.cat_col.groupby(df.group).transform(lambda g: 'r' if g.eq('r').any() else g)
0 r
1 r
2 r
3 r
4 nr
5 nr
Name: cat_col, dtype: object
If only need to replace nr
with r
:
df.cat_col = df.cat_col.groupby(df.group).transform(
lambda g: g.replace('nr', 'r') if g.eq('r').any() else g
)
Upvotes: 3
Reputation: 1022
This has a few more steps but I think is quite clear to follow:
groups_that_pass_the_condition = []
groups = df.group.unique()
for group in groups:
cat_col_by_group = df.loc[df.group == group]['cat_col']
value_counts = cat_col_by_group.value_counts()
if 'r' in value_counts.index:
if value_counts.r >= 1:
groups_that_pass_the_condition.append(group)
for group_that_passed in groups_that_pass_the_condition:
df.loc[df.group == group_that_passed] = df.loc[df.group == group_that_passed].replace('nr', 'r')
print(df)
OUT:
group cat_col
0 g1 r
1 g1 r
2 g1 r
3 g1 r
4 g2 nr
5 g2 nr
Upvotes: 0
Reputation: 10624
Here is one way to do it:
l=df[df['cat_col']=='r']['group'].to_list()
df.loc[df['group'].isin(l), 'cat_col'] = df.loc[df['group'].isin(gr), 'cat_col'].replace('nr', 'r')
Output:
>>> print(df)
group cat_col
0 g1 r
1 g1 r
2 g1 r
3 g1 r
4 g2 nr
5 g2 nr
Upvotes: 0