Reputation: 677
I'm trying to apply this function to a pandas data frame in order to see if a taxi pickup or dropoff time falls within the range that I created using the arrivemin, arrive max variable below.
If the the time does fall into the range, I want to keep the row. If it's outside the range I want to drop it from the dataframe.
Start.Time, End.Time etc are all datetime objects so the time functionality should work fine.
def time_function(df, row):
gametimestart = df['Start.Time']
gametimeend = df['End.Time']
arrivemin = gametimestart - datetime.timedelta(minutes=120)
arrivemax = gametimeend - datetime.timedelta(minutes = 30)
departmin = gametimeend - datetime.timedelta(minutes = 60)
departmax = gametimeend + datetime.timedelta(minutes = 90)
for not i in ((df['pickup_datetime'] > arrivemin) & (df['pickupdatetime'] < arrivemax) &(df['dropoff_datetime'] > departmin) & (df['dropoffdatetime'] < departmax)):
df = df.drop[df[i.index]]
return
for index, row in yankdf:
time_function(yankdf, row)
Keep getting this syntax error:
File "<ipython-input-25-bda6fb2db429>", line 17
for not i in (((row['pickup_datetime'] > arrivemin) & (row['pickupdatetime'] < arrivemax)) | ((row['dropoff_datetime'] > departmin) & (row['dropoffdatetime'] < departmax)):
^
SyntaxError: invalid syntax
Upvotes: 1
Views: 797
Reputation: 1946
I don't think you need the function. Just perform a basic subset and df_filtered should be your filtered dataframe.
gametimestart = df['Start.Time']
gametimeend = df['End.Time']
arrivemin = gametimestart - datetime.timedelta(minutes=120)
arrivemax = gametimeend - datetime.timedelta(minutes = 30)
departmin = gametimeend - datetime.timedelta(minutes = 60)
departmax = gametimeend + datetime.timedelta(minutes = 90)
df_filtered = df[(df['pickup_datetime'] > arrivemin) &
(df['pickup_datetime'] < arrivemax) &
(df['dropoff_datetime'] > departmin) &
(df['dropoffdatetime'] < departmax)]
Upvotes: 1