Reputation: 125
I am trying to create a new column for the lower confidence interval using other values in the row. I have written (and released) the confidence interval calculations as a package public-health-cis
on pypi
. These functions take in float values and return a float.
In my analysis script, I am trying to call this function from a pandas dataframe. I have tried several options to attempt to get this working, to no avail.
df_for_ci_calcs = df[['Value', 'Count', 'Denominator']].copy()
df_for_ci_calcs = df_for_ci_calcs.applymap(lambda x: -1 if x == '*' else x)
df_for_ci_calcs = df_for_ci_calcs.astype(np.float)
df['LowerCI'].apply(lambda x: public_health_cis.wilson_lower(df_for_ci_calcs['Value'].astype(float),
df_for_ci_calcs['Count'].astype(float),
df_for_ci_calcs['Denominator'].astype(float), indicator.rate))
Comes back with this traceback:
Internal Server Error: /
df['LowerCI'].apply(lambda x: public_health_cis.wilson_lower(df_for_ci_calcs['Value'].astype(float), df_for_ci_calcs['Count'].astype(float), df_for_ci_calcs['Denominator'].astype(float), indica
tor.rate))
TypeError: cannot convert the series to <class 'float'>
I have also tried using:
df['LowerCI'] = df_for_ci_calcs.applymap(lambda x: public_health_cis.wilson_lower(df_for_ci_calcs['Value'], df_for_ci_calcs['Count'],
df_for_ci_calcs['Denominator'], indicator.rate), axis=1)
which delivers the error:
applymap() got an unexpected keyword argument 'axis'
When I take the axis kwarg out, I get the same error as the first method. So, how do I pass values from each row into a function to get a value based on the data in those rows?
Upvotes: 1
Views: 6710
Reputation: 862406
I think you need apply
with axis=1
for process by rows, so get input as float
s:
df['LowerCI'] = df[['Value', 'Count', 'Denominator']]
.replace('*', -1)
.astype(float)
.apply(lambda x: public_health_cis.wilson_lower(x['Value'],
x['Count'],
x['Denominator'],
indicator.rate),
axis=1)
Sample (for simplify I change indicator.rate
to scalar 100
):
df = pd.DataFrame({'Value':['*',2,3],
'Count':[4,5,6],
'Denominator':[7,8,'*'],
'D':[1,3,5],
'E':[5,3,6],
'F':[7,4,3]})
print (df)
Count D Denominator E F Value
0 4 1 7 5 7 *
1 5 3 8 3 4 2
2 6 5 * 6 3 3
df['LowerCI'] = df[['Value', 'Count', 'Denominator']] \
.replace('*', -1) \
.astype(float) \
.apply(lambda x: public_health_cis.wilson_lower(x['Value'],
x['Count'],
x['Denominator'],
100), axis=1)
print (df)
Count D Denominator E F Value LowerCI
0 4 1 7 5 7 * 14.185885
1 5 3 8 3 4 2 18.376210
2 6 5 * 6 3 3 99.144602
Upvotes: 4