Reputation: 325
Let's say I have a Pandas dataframe df
:
start_time Event
0 0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
I want to set the value of the Event
column to -1 when the corresponding start_time
lies between two values, so I define this function:
def test(time):
if (time['start_time'] >= 5) and (time['start_time'] <= 8):
return -1
else:
return time
To apply this to the event column, I do the following:
df[['Event']] = df[['Event']].apply(test,axis=1)
which yields this error: KeyError: ('start_time', 'occurred at index 0')
Why is this happening? Should be a simple fix.
Upvotes: 3
Views: 4383
Reputation: 4821
The function that you are passing to .apply()
uses the start_time
field of the input argument (in the conditional checks if (time['start_time'] >= 5) and (time['start_time'] <= 8)
). So it should be applied to a DataFrame or Series that has a start_time
column.
However, before you call apply you are first calling df[['Event']]
, which returns a Series. So df[['Event']].apply()
will apply a function to the resulting Series. But when the function reaches the expression time['start_time']
, it is looking for a column called start_time
in the Series, can't find it (because only 'Event' column was kept), and raises a KeyError.
The solution is to pass a DataFrame or a Series that has a start_time
column in it. In your case you want to apply the function to the entire DataFrame so replace df[['Event']]
with the whole DataFrame df
.
df = df.apply(test, axis=1)
and change your function to modify the Event
column instead of returning a value. Replace return -1
with time['Event'] = -1
and eliminate the else return time
part (i.e., don't change anything if the conditions aren't met).
Upvotes: 1