Reputation: 14062
I have a df which looks like this:
df
dim_pptx qp_pptx diff
Absolute Radio 7.39 7.53 0.14
BBC Asian Network 0.13 0.13 0.00
BBC Radio 1 14.41 14.55 0.14
BBC Radio 1Xtra 0.57 0.58 0.01
BBC Radio 2 23.36 23.39 0.03
I want to add a new column which contains values based on df['diff']
Expected output:
df
dim_pptx qp_pptx diff sig
Absolute Radio 7.39 7.53 0.14 **
BBC Asian Network 0.13 0.13 0.00 -
BBC Radio 1 14.41 14.55 0.14 **
BBC Radio 1Xtra 0.57 0.58 0.01 -
BBC Radio 2 23.36 23.39 0.03 *
so the condition would be:
if value > 0.1:
value = '**'
elif value > 0.02:
value = '*'
else:
value = '-'
my attempt:
comp_df['sig'] = comp_df.apply(lambda x : '*' if comp_df['diff'] > 0.01 else '', axis=0)
error:
ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', u'occurred at index dim_pptx')
Upvotes: 2
Views: 2816
Reputation: 393963
You can just set all the values that meet your criteria rather than looping over the df by calling apply
so the following should work and as it's vectorised will scale better for larger datasets:
df.loc[df['diff'] > 0.1,'sig'] = '**'
df.loc[(df['diff'] > 0.02) & (df['diff'] <= 0.1), 'sig'] = '*'
df.loc[df['diff'] <= 0.02, 'sig'] = '-'
this will set all rows that meet the criteria, the problem using apply
is that it's just syntactic sugar for a for
loop and where possible this should be avoided where a vectorised solution exists.
Upvotes: 1
Reputation: 90889
When using DataFrame.apply
if you use axis=0
it applies the condition through columns, to use apply
to go through each row, you need axis=1
.
But given that, you can use Series.apply
instead of DataFrame.apply
on the 'diff'
series. Example -
comp_df['sig'] = comp_df['diff'].apply(lambda x: '**' if x > 0.1 else '*' if x > 0.02 else '-')
Upvotes: 2