Reputation: 345
I have a function that requires three arguments:
def R0(confirm, suspect,t):
p = 0.695
si = 7.5
yt = suspect * p + confirm
lamda = math.log(yt)/t
R0 = 1 + lamda * si + p * (1 - p) * pow(lamda * si,2)
return R0
And a dataframe with three columns:
data = {'confirm': ['41', '41', '43', '44'],
'suspect': ['0', '0', '0', '10'],
't': ['0', '1', '2', '3']
}
df = pd.DataFrame (data, columns = ['confirm','suspect', 't'])
I would like to use each row (with three columns, and hence three values) as the argument values for the function. Finally, I would like to loop over rows of the dataframe and return a list.
For instance, the results should look like:
result = [R0_Value1, R0_Value2, R0_Value3, ....] where
R0_Value1 = R0(41, 0, 0)
R0_Value2 = R0(41, 0, 1)
R0_Value3 = R0(43, 0, 2)
...
I figure out it probably has something to do with pandas.DataFrame.apply
and *
. But I am new to Python and could not figure out how to do it. Could someone please help?
Upvotes: 2
Views: 5925
Reputation: 13387
You can do:
df["formula"]=df.apply(lambda x: R0(*x), axis=1)
The whole thing (there were couple of other things in need of polishing):
import pandas as pd
import math
def R0(confirm, suspect,t):
p = 0.695
si = 7.5
yt = suspect * p + confirm
lamda = math.log(yt)/max(t,1) #you need to handle division by 0 somehow
R= 1 + lamda * si + p * (1 - p) * math.pow((lamda * si),2)
return R
data = {'confirm': ['41', '41', '43', '44'],
'suspect': ['0', '0', '0', '10'],
't': ['0', '1', '2', '3']
}
df = pd.DataFrame(data, columns = ['confirm','suspect', 't']).astype(int) #note it has to be numeric to conduct all the arithmetics you are doing later
df["formula"]=df.apply(lambda x: R0(*x), axis=1)
Outputs:
confirm suspect t formula
0 41 0 0 193.285511
1 41 0 1 193.285511
2 43 0 2 57.274157
3 44 10 3 31.297989
Upvotes: 2
Reputation: 22493
If you insist of using pandas
, you can also do the calculations directly using numpy
without a function:
df = pd.DataFrame (data, columns = ['confirm','suspect', 't']).astype(int)
p = 0.695
si = 7.5
df['results'] = 1 +(np.log(df["suspect"]*p + df["confirm"])/df["t"])*si \
+ p*(1-p)*np.power((np.log(df["suspect"]*p + df["confirm"])/df["t"])*si,2)
print (df)
#
confirm suspect t results
0 41 0 0 inf
1 41 0 1 193.285511
2 43 0 2 57.274157
3 44 10 3 31.297989
Upvotes: 1
Reputation: 304
You were looking in the right direction with 'apply':
# Convert values to int (now strings, which will throw an error in R0)
df = df.applymap(int)
df['results'] = df.apply(lambda x: R0(x.confirm, x.suspect, x.t), axis=1)
What happens when you use the apply function is that (in case of axis=1) the whole row is used as the first argument in the specified function. The lambda function is basically a wrapper that transforms this single argument (x) into the three unpacked values and passes them in the correct order to the next function, R0.
Upvotes: 3
Reputation: 345
df.apply(lambda x: R0(x[0], x[1], x[2]), axis=1)
will give the right result.
Upvotes: 0