RustyBrain
RustyBrain

Reputation: 125

Passing column values into lambda function in Pandas

I am trying to create a new column for the lower confidence interval using other values in the row. I have written (and released) the confidence interval calculations as a package public-health-cis on pypi. These functions take in float values and return a float.

In my analysis script, I am trying to call this function from a pandas dataframe. I have tried several options to attempt to get this working, to no avail.

    df_for_ci_calcs = df[['Value', 'Count', 'Denominator']].copy()
    df_for_ci_calcs = df_for_ci_calcs.applymap(lambda x: -1 if x == '*' else x)
    df_for_ci_calcs = df_for_ci_calcs.astype(np.float)
    df['LowerCI'].apply(lambda x: public_health_cis.wilson_lower(df_for_ci_calcs['Value'].astype(float),
                                      df_for_ci_calcs['Count'].astype(float), 
                                      df_for_ci_calcs['Denominator'].astype(float), indicator.rate))

Comes back with this traceback:

Internal Server Error: /

df['LowerCI'].apply(lambda x: public_health_cis.wilson_lower(df_for_ci_calcs['Value'].astype(float), df_for_ci_calcs['Count'].astype(float), df_for_ci_calcs['Denominator'].astype(float), indica
tor.rate))   

TypeError: cannot convert the series to <class 'float'>

I have also tried using:

df['LowerCI'] = df_for_ci_calcs.applymap(lambda x: public_health_cis.wilson_lower(df_for_ci_calcs['Value'], df_for_ci_calcs['Count'],
                                                         df_for_ci_calcs['Denominator'], indicator.rate), axis=1)

which delivers the error:

applymap() got an unexpected keyword argument 'axis'

When I take the axis kwarg out, I get the same error as the first method. So, how do I pass values from each row into a function to get a value based on the data in those rows?

Upvotes: 1

Views: 6710

Answers (1)

jezrael
jezrael

Reputation: 862406

I think you need apply with axis=1 for process by rows, so get input as floats:

df['LowerCI'] = df[['Value', 'Count', 'Denominator']]
                .replace('*', -1)
                .astype(float)
                .apply(lambda x: public_health_cis.wilson_lower(x['Value'],
                                                                x['Count'], 
                                                                x['Denominator'], 
                                                                indicator.rate), 
                                                                axis=1)

Sample (for simplify I change indicator.rate to scalar 100):

df = pd.DataFrame({'Value':['*',2,3],
                   'Count':[4,5,6],
                   'Denominator':[7,8,'*'],
                   'D':[1,3,5],
                   'E':[5,3,6],
                   'F':[7,4,3]})

print (df)
   Count  D Denominator  E  F Value
0      4  1           7  5  7     *
1      5  3           8  3  4     2
2      6  5           *  6  3     3

df['LowerCI'] = df[['Value', 'Count', 'Denominator']] \
                .replace('*', -1) \
                .astype(float) \
                .apply(lambda x: public_health_cis.wilson_lower(x['Value'],
                                                                x['Count'], 
                                                                x['Denominator'],  
                                                                100), axis=1)

print (df)
   Count  D Denominator  E  F Value    LowerCI
0      4  1           7  5  7     *  14.185885
1      5  3           8  3  4     2  18.376210
2      6  5           *  6  3     3  99.144602

Upvotes: 4

Related Questions