Reputation: 71
I have a dataframe incomeData with Country, Region and Income. I am attempting to use aggregation to return mean, min, max and counts. I want to be able to count the countries where the income is greater than 100.
raw_data = {'Country': ['A', 'B', 'C', 'D', 'E'],
'Region': ['X', 'X', 'X', 'Y', 'Y'],
'Income': [100, 200, 300, 100, 200]
}
incomeData = pd.DataFrame(raw_data, columns = ['Country', 'Region', 'Income'])
regionGroup = incomeData.groupby(['Region'], as_index=False)
groupCount = lambda x: x.count()
#CountHighIncome = ?
aggregations = {
'Country': {groupCount
},
'Income': {'min', 'max', 'mean', 'median' #, CountHighIncome
}
}
incomeSummary = regionGroup.agg(aggregations)
incomeSummary
Region Country Income
lambda> median max mean min CountHighIncome
0 X 3 200 300 200 100 2
1 Y 2 150 200 150 100 1
Please let me know if the lambda approach to count countries within a region can be extended to count countries within a region where the income is greater than 100. Or if there is an alternative better approach to this problem.
Many Thanks in Advance.
Upvotes: 1
Views: 645
Reputation: 862641
You can use custom function with lambda
with sum condition, True
s are count like 1
, also for Country
is removed lambda
function and is used count
only:
CountHighIncome = lambda x: (x > 100).sum()
aggregations = {
'Country': {'count'
},
'Income': {'min', 'max', 'mean', 'median', CountHighIncome
}
}
incomeSummary = regionGroup.agg(aggregations)
print (incomeSummary)
Region Income Country
max min <lambda> mean median count
0 X 300 100 2 200 200 3
1 Y 200 100 1 150 150 2
Upvotes: 1