Reputation: 148
It seems I can apply some functions without problems to a DataFrame, but other give a Value Error.
dates = pd.date_range('20130101',periods=6)
data = np.random.randn(6,4)
df = pd.DataFrame(data,index=dates,columns=list('ABCD'))
def my_max(y):
return max(y,0)
def times_ten(y):
return 10*y
df.apply(lambda x:times_ten(x)) # Works fine
df.apply(lambda x:my_max(x)) # Doesn't work
The first apply works fine, the second one generates a:
ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', u'occurred at index A')
I know I can generate the "max(df,0)" in other ways (e.g. by df[df<0]=0), so I'm not looking for a solution to this particular problem. Rather, I'm interested in why the apply above doesn't work.
Upvotes: 3
Views: 3052
Reputation: 77941
max
cannot handle a scalar and an array:
>>> max(df['A'], 0)
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
either use np.maximum
which does element-wise maximum:
>>> def my_max(y):
... return np.maximum(y, 0)
...
>>> df.apply(lambda x:my_max(x))
A B C D
2013-01-01 0.000 0.000 0.178 0.992
2013-01-02 0.000 1.060 0.000 0.000
2013-01-03 0.528 2.408 2.679 0.000
2013-01-04 0.564 0.573 0.320 1.220
2013-01-05 0.903 0.497 0.000 0.032
2013-01-06 0.505 0.000 0.000 0.000
or use .applymap
which operates elementwise:
>>> def my_max(y):
... return max(y,0)
...
>>> df.applymap(lambda x:my_max(x))
A B C D
2013-01-01 0.000 0.000 0.178 0.992
2013-01-02 0.000 1.060 0.000 0.000
2013-01-03 0.528 2.408 2.679 0.000
2013-01-04 0.564 0.573 0.320 1.220
2013-01-05 0.903 0.497 0.000 0.032
2013-01-06 0.505 0.000 0.000 0.000
Upvotes: 4