Akash
Akash

Reputation: 59

Pandas : How to get mean of the column from the filtered values of other column?

 Name      Runs   Score         
 Richard   2000    2
 Richard   3000    3
 Richard   7000    8
 Richard   8000    7
 Richard   2000    4
 Richard   9000    7.5

lessthan1 = str(len(df[(df['Score']>=0) & (df['Score']<1)]))
oneto5 = str(len(df[(df['Score']>=1) & (df['Score']<5)]))
fiveto10 = str(len(df[(df['Score']>=5) & (df['Score']<10)]))

For example in category >=1 to <5..am expecting (2000+3000+2000)/31000

Upvotes: 0

Views: 609

Answers (4)

BCJuan
BCJuan

Reputation: 825

Also, you can use pandas apply functions with lambdas:

df.loc[df['Score'].apply(lambda x: x < 1 and x > 0), 'Runs'].mean()
df.loc[df['Score'].apply(lambda x: x >= 1 and x < 5), 'Runs'].mean()
df.loc[df['Score'].apply(lambda x: x >= 5 and x < 10), 'Runs'].mean()

Result:

5166.666666666667
nan
2333.3333333333335

To obtain mean averaged over total sum in score you have to change mean() by sum() and divide by total sum(). For example

df.loc[df['Score'].apply(lambda x: x >= 1 and x < 5), 'Runs'].sum()/df['Runs'].sum()

Result:

0.22580645161290322 = (2000 + 3000 + 2000)/31000

Upvotes: 0

Jon Clements
Jon Clements

Reputation: 142216

Looks like you can use pd.cut to bin the score data and use that to group the runs, eg:

df.groupby(pd.cut(df['Score'], [0, 1, 5, 10]))['Runs'].mean()

Gives you:

Score
(0, 1]             NaN
(1, 5]     2333.333333
(5, 10]    8000.000000
Name: Runs, dtype: float64

Upvotes: 3

Reedinationer
Reedinationer

Reputation: 5774

import pandas as pd
array = [
    ["Richard",   2000,    2],
    ["Richard",   3000,    3],
    ["Richard",   7000,    8],
    ["Richard",   8000,    7],
    ["Richard",   2000,    4],
    ["Richard",   9000,    7.5]]
df = pd.DataFrame(array, columns=["Name", "Runs", "Score"])
print(df[(df['Score'] >= 0) & (df['Score'] < 1)]["Runs"].mean())
print(df[(df['Score'] >= 1) & (df['Score'] < 5)]["Runs"].mean())
print(df[(df['Score'] >= 5) & (df['Score'] < 10)]["Runs"].mean())

nan

2333.3333333333335

8000.0

You need to select just the Runs column and then use the .mean() method on that after you implement your boolean selection

Upvotes: 0

Snedecor
Snedecor

Reputation: 739

You could try using boolean masking as follows,

lessthan1 = df.loc[(df['Score'] >= 0) & (df['Score'] <1), 'Runs'].mean()
oneto5 = df.loc[(df['Score']>=1) & (df['Score']<5), 'Runs'].mean()
fiveto10 = df.loc[(df['Score']>=5) & (df['Score']<10), 'Runs'].mean()

print('Less than 1:', lessthan1)
print('One to Five:', oneto5)
print('Five to Ten:', fiveto10)

Upvotes: 0

Related Questions