liu gang
liu gang

Reputation: 331

pandas number >mean(), or <mean() , than output a number

I have a dataframe like this:

Id  F   M   R
7   1   286 907
12  1   286 907
17  1   186 1271
21  1   296 905
30  1   308 908
32  1   267 905
40  2   591 788
41  1   486 874
47  1   686 906
74  1   230 907

for each row if f> f's mean() and M> M's mean() and R>R's mean(),then output in new column is "1".

like this:

Id  F   M   R    score
7   1   286 907    1
12  1   286 907    0 
17  1   186 1271   1 
21  1   296 905
30  1   308 908
32  1   267 905
40  2   591 788
41  1   486 874
47  1   686 906
74  1   230 907

Upvotes: 1

Views: 68

Answers (2)

jezrael
jezrael

Reputation: 863301

You can use numpy.where with mask created with comparing 3 columns with their mean and then use all for check all rows are True:

# I modify last value in row with index 6 to 1000
print (df)
   Id  F    M     R
0   7  1  286   907
1  12  1  286   907
2  17  1  186  1271
3  21  1  296   905
4  30  1  308   908
5  32  1  267   905
6  40  2  591  1000
7  41  1  486   874
8  47  1  686   906
9  74  1  230   907
print (df.F.mean())
1.1
print (df.M.mean())
362.2
print (df.R.mean())
949.0

print (df[['F','M','R']] > df[['F','M','R']].mean())
       F      M      R
0  False  False  False
1  False  False  False
2  False  False   True
3  False  False  False
4  False  False  False
5  False  False  False
6   True   True   True
7  False   True  False
8  False   True  False
9  False  False  False

mask = (df[['F','M','R']] > df[['F','M','R']].mean()).all(1)
print (mask)
0    False
1    False
2    False
3    False
4    False
5    False
6     True
7    False
8    False
9    False
dtype: bool
df['score'] = np.where(mask,1,0)
print (df)
   Id  F    M     R  score
0   7  1  286   907      0
1  12  1  286   907      0
2  17  1  186  1271      0
3  21  1  296   905      0
4  30  1  308   908      0
5  32  1  267   905      0
6  40  2  591  1000      1
7  41  1  486   874      0
8  47  1  686   906      0
9  74  1  230   907      0

If condition is changed:

mask = (df.F > df.F.mean()) & (df.M < df.M.mean()) & (df.R < df.R.mean())
print (mask)
0    False
1    False
2    False
3    False
4    False
5    False
6    False
7    False
8    False
9    False
dtype: bool
df['score'] = np.where(mask,2,0)
print (df)
   Id  F    M     R  score
0   7  1  286   907      0
1  12  1  286   907      0
2  17  1  186  1271      0
3  21  1  296   905      0
4  30  1  308   908      0
5  32  1  267   905      0
6  40  2  591  1000      0
7  41  1  486   874      0
8  47  1  686   906      0
9  74  1  230   907      0

EDIT:

I think you can first check if in some conditions are not in some row more as one values by:

mask1 = (df.F > df.F.mean()) & (df.M > df.M.mean()) & (df.R > df.R.mean())

mask2 = (df.F > df.F.mean()) & (df.M < df.M.mean()) & (df.R < df.R.mean())

mask3 = (df.F < df.F.mean()) & (df.M < df.M.mean()) & (df.R < df.R.mean())


df['score1'] = np.where(mask1,1,0)
df['score2'] = np.where(mask2,2,0)
df['score3'] = np.where(mask3,3,0)

If not, use:

df.loc[mask1, 'score'] = 1
df.loc[mask2, 'score'] = 2
df.loc[mask3, 'score'] = 3
df.score.fillna(0, inplace=True)

Upvotes: 2

Satya
Satya

Reputation: 5907

df.loc[df['f']>df['f'].mean(),['f']] += 1
df.loc[df['m']>df['m'].mean(),['m']] += 1
df.loc[df['r']>df['r'].mean(),['r']] += 1

Have not tested this,please try and comment if works.

Or try this one

df['f'] = [x+1 for x in df['f'] if x>df['f'].mean()]
df['m'] = [x+1 for x in df['m'] if x>df['m'].mean()]
df['r'] = [x+1 for x in df['r'] if x>df['r'].mean()]

Upvotes: 0

Related Questions