Reputation: 2253
I usually use R. If I have a data like:
Product Index Value
a 1 0.5
a 1 0.4
c 1 1.4
c 2 0.75
e 2 0.6
f 3 0.9
If my R code is :
a <- data %>%
group_by(Product) %>%
mutate(Flag=ifelse(all(Index==1),'right','wrong'))
It means, I group the data by Product first. Then for each group, I will give it a new field called Flag. If Index in this group is all 1, then Flag is right, otherwise it is wrong. At the same time, all the reocrds are remained. So, The result should look like:
Product Index Value Flag
a 1 0.5 right
a 1 0.4 right
c 1 1.4 wrong
c 2 0.75 wrong
e 2 0.6 wrong
f 3 0.9 wrong
My question is: how to do the same operations in python? I have tried, np.where, groupby, transform and other functions. I probably combine them the wrong way.
Upvotes: 2
Views: 3497
Reputation: 863166
Use GroupBy.transform
with mean
, compare by 1
and numpy.where
for values by condition:
df['Flag'] = np.where(df.groupby('Product')['Index'].transform('mean')== 1, 'right', 'Wrong')
print (df)
Product Index Value Flag
0 a 1 0.50 right
1 a 1 0.40 right
2 c 1 1.40 Wrong
3 c 2 0.75 Wrong
4 e 2 0.60 Wrong
5 f 3 0.90 Wrong
Detail:
print (df.groupby('Product')['Index'].transform('mean'))
0 1.0
1 1.0
2 1.5
3 1.5
4 2.0
5 3.0
Name: Index, dtype: float64
Another solution:
First compare by 1
by eq
and then GroupBy.transform
with all
for check if all True
s per group and numpy.where
for values by condition:
df['Flag'] = np.where(df['Index'].eq(1).groupby(df['Product']).transform('all'),
'right', 'Wrong')
print (df)
Product Index Value Flag
0 a 1 0.50 right
1 a 1 0.40 right
2 c 1 1.40 Wrong
3 c 2 0.75 Wrong
4 e 2 0.60 Wrong
5 f 3 0.90 Wrong
Detail:
print (df['Index'].eq(1).groupby(df['Product']).transform('all'))
0 True
1 True
2 False
3 False
4 False
5 False
Name: Index, dtype: bool
Upvotes: 2
Reputation: 2441
Using transform is an option.
import pandas as pd
df = pd.DataFrame({'Product': ['a', 'a', 'c', 'c', 'e', 'f'],
'Index': [1, 1, 1, 2, 2, 3],
'Value': [0.5, 0.4, 1.4, 0.75, 0.6, 0.9]})
df['Flag'] = df.groupby('Product')['Index'].transform(lambda x: 'right' if sum(x)/len(x) == 1 else 'wrong')
df
On a side note, if Flag values are just 'right' and 'wrong', substituting them with 0,1 values might be more efficient.
Upvotes: 5
Reputation: 51165
You may use unique()
and groupby()
to make sure all Products have an index of 1, and then map to your new column:
In [51]: df['Flag'] = df['Product'].map(df.groupby('Product')['Index'].unique().apply(lambda row: 'right' if all(row==[1]) else 'wrong'))
In [52]: df
Out[52]:
Product Index Value Flag
0 a 1 0.50 right
1 a 1 0.40 right
2 c 1 1.40 wrong
3 c 2 0.75 wrong
4 e 2 0.60 wrong
5 f 3 0.90 wrong
Upvotes: 2