Feng Chen
Feng Chen

Reputation: 2253

In Python, how to do group by + mutate + ifelse like R?

I usually use R. If I have a data like:

Product    Index   Value
   a         1       0.5
   a         1       0.4
   c         1       1.4
   c         2       0.75
   e         2       0.6
   f         3       0.9

If my R code is :

a <- data %>%
  group_by(Product) %>%
  mutate(Flag=ifelse(all(Index==1),'right','wrong'))

It means, I group the data by Product first. Then for each group, I will give it a new field called Flag. If Index in this group is all 1, then Flag is right, otherwise it is wrong. At the same time, all the reocrds are remained. So, The result should look like:

Product    Index   Value    Flag
   a         1       0.5    right
   a         1       0.4    right
   c         1       1.4    wrong
   c         2       0.75   wrong
   e         2       0.6    wrong
   f         3       0.9    wrong

My question is: how to do the same operations in python? I have tried, np.where, groupby, transform and other functions. I probably combine them the wrong way.

Upvotes: 2

Views: 3497

Answers (3)

jezrael
jezrael

Reputation: 863166

Use GroupBy.transform with mean, compare by 1 and numpy.where for values by condition:

df['Flag'] = np.where(df.groupby('Product')['Index'].transform('mean')== 1, 'right', 'Wrong')
print (df)
  Product  Index  Value   Flag
0       a      1   0.50  right
1       a      1   0.40  right
2       c      1   1.40  Wrong
3       c      2   0.75  Wrong
4       e      2   0.60  Wrong
5       f      3   0.90  Wrong

Detail:

print (df.groupby('Product')['Index'].transform('mean'))
0    1.0
1    1.0
2    1.5
3    1.5
4    2.0
5    3.0
Name: Index, dtype: float64

Another solution:

First compare by 1 by eq and then GroupBy.transform with all for check if all Trues per group and numpy.where for values by condition:

df['Flag'] = np.where(df['Index'].eq(1).groupby(df['Product']).transform('all'), 
                      'right', 'Wrong')
print (df)
  Product  Index  Value   Flag
0       a      1   0.50  right
1       a      1   0.40  right
2       c      1   1.40  Wrong
3       c      2   0.75  Wrong
4       e      2   0.60  Wrong
5       f      3   0.90  Wrong

Detail:

print (df['Index'].eq(1).groupby(df['Product']).transform('all'))
0     True
1     True
2    False
3    False
4    False
5    False
Name: Index, dtype: bool

Upvotes: 2

datapug
datapug

Reputation: 2441

Using transform is an option.

import pandas as pd

df = pd.DataFrame({'Product': ['a', 'a', 'c', 'c', 'e', 'f'],
               'Index': [1, 1, 1, 2, 2, 3], 
              'Value': [0.5, 0.4, 1.4, 0.75, 0.6, 0.9]})
df['Flag'] = df.groupby('Product')['Index'].transform(lambda x: 'right' if sum(x)/len(x) == 1 else 'wrong')
df

On a side note, if Flag values are just 'right' and 'wrong', substituting them with 0,1 values might be more efficient.

Upvotes: 5

user3483203
user3483203

Reputation: 51165

You may use unique() and groupby() to make sure all Products have an index of 1, and then map to your new column:

In [51]: df['Flag'] = df['Product'].map(df.groupby('Product')['Index'].unique().apply(lambda row: 'right' if all(row==[1]) else 'wrong'))

In [52]: df
Out[52]:
  Product  Index  Value   Flag
0       a      1   0.50  right
1       a      1   0.40  right
2       c      1   1.40  wrong
3       c      2   0.75  wrong
4       e      2   0.60  wrong
5       f      3   0.90  wrong

Upvotes: 2

Related Questions