Reputation: 1960
I need to do a lot of aggregation on data and I was hoping to write a function that would allow me to pass
1) The string to use for grouping 2) The fields that would constitute the numerator/denominator/ and formula
As I will be doing a lot of cuts on the data using different groupings and different numerators and denominators, it would be easier for me to create a generic group by and pass it what I need
So lets take the following example:
import pandas as pd
df=pd.read_csv("https://raw.githubusercontent.com/wesm/pydata-book/master/ch08/tips.csv", sep=',')
(df.groupby(['sex', 'smoker'])[['total_bill','tip']].sum().apply(lambda r: r.tip/r.total_bill, axis = 1))
Now, I would want to create a function that would allow me to pass a group by value and a numerator denominator field
So, for example
groupbyvalue=['sex', 'smoker']
fieldstoaggregate=['tip','total_bill']
And plug them into something like
(df.groupby(groupbyvalue)[fieldstoaggregate].sum().apply(lambda r: r.tip/r.total_bill, axis = 1))
That works fine, but when I tried to replace the formula with something like:
dfformula="r.tip/r.total_bill"
And then placed it in the formula as follows
(df.groupby(groupbyvalue)[fieldstoaggregate].sum().apply(lambda r: dfformula, axis = 1)*10000)
My output looks as follows:
sex smoker
Female No r.tip/r.total_billr.tip/r.total_billr.tip/r.to...
Yes r.tip/r.total_billr.tip/r.total_billr.tip/r.to...
Male No r.tip/r.total_billr.tip/r.total_billr.tip/r.to...
Yes r.tip/r.total_billr.tip/r.total_billr.tip/r.to...
dtype: object
Is there any way to create the calculation dynamically then use it in the formula rather than having it interpreted as a string?
Thanks
Upvotes: 0
Views: 304
Reputation: 443
You can achieve this using eval()
function
import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/wesm/pydata-book/master/ch08/tips.csv", sep=',')
groupbyvalue = ['sex', 'smoker']
fieldstoaggregate = ['tip','total_bill']
dfformula = "r.tip/r.total_bill"
(df.groupby(groupbyvalue)[fieldstoaggregate].sum().apply(lambda r: eval(dfformula), axis = 1))
The output would be as follows
sex smoker
Female No 0.153189
Yes 0.163062
Male No 0.157312
Yes 0.136919
dtype: float64
Upvotes: 2