Boosted_d16
Boosted_d16

Reputation: 14062

pandas: add value based on another column's value

I have a df which looks like this:

df
                    dim_pptx  qp_pptx  diff
Absolute Radio          7.39     7.53  0.14
BBC Asian Network       0.13     0.13  0.00
BBC Radio 1            14.41    14.55  0.14
BBC Radio 1Xtra         0.57     0.58  0.01
BBC Radio 2            23.36    23.39  0.03

I want to add a new column which contains values based on df['diff']

Expected output:

df
                    dim_pptx  qp_pptx  diff  sig
Absolute Radio          7.39     7.53  0.14   **
BBC Asian Network       0.13     0.13  0.00    - 
BBC Radio 1            14.41    14.55  0.14   **
BBC Radio 1Xtra         0.57     0.58  0.01    -
BBC Radio 2            23.36    23.39  0.03    *

so the condition would be:

if value > 0.1:
    value = '**'
elif value > 0.02:
    value = '*'
else:
    value = '-'

my attempt:

comp_df['sig'] = comp_df.apply(lambda x : '*' if comp_df['diff'] > 0.01 else '', axis=0)

error:

 ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', u'occurred at index dim_pptx')

Upvotes: 2

Views: 2816

Answers (2)

EdChum
EdChum

Reputation: 393963

You can just set all the values that meet your criteria rather than looping over the df by calling apply so the following should work and as it's vectorised will scale better for larger datasets:

df.loc[df['diff'] > 0.1,'sig'] = '**'
df.loc[(df['diff'] > 0.02) & (df['diff'] <= 0.1), 'sig'] = '*'
df.loc[df['diff'] <= 0.02, 'sig'] = '-'

this will set all rows that meet the criteria, the problem using apply is that it's just syntactic sugar for a for loop and where possible this should be avoided where a vectorised solution exists.

Upvotes: 1

Anand S Kumar
Anand S Kumar

Reputation: 90889

When using DataFrame.apply if you use axis=0 it applies the condition through columns, to use apply to go through each row, you need axis=1.

But given that, you can use Series.apply instead of DataFrame.apply on the 'diff' series. Example -

comp_df['sig'] = comp_df['diff'].apply(lambda x: '**' if x > 0.1 else '*' if x > 0.02 else '-')

Upvotes: 2

Related Questions