Ben Price
Ben Price

Reputation: 677

function won't apply to pandas data frame, getting syntax error

I'm trying to apply this function to a pandas data frame in order to see if a taxi pickup or dropoff time falls within the range that I created using the arrivemin, arrive max variable below.

If the the time does fall into the range, I want to keep the row. If it's outside the range I want to drop it from the dataframe.

Start.Time, End.Time etc are all datetime objects so the time functionality should work fine.

def time_function(df, row):
    gametimestart = df['Start.Time'] 
    gametimeend = df['End.Time'] 
    arrivemin = gametimestart - datetime.timedelta(minutes=120) 
    arrivemax = gametimeend - datetime.timedelta(minutes = 30) 
    departmin = gametimeend - datetime.timedelta(minutes = 60) 
    departmax = gametimeend + datetime.timedelta(minutes = 90)
    for not i in ((df['pickup_datetime'] > arrivemin) & (df['pickupdatetime'] < arrivemax) &(df['dropoff_datetime'] > departmin) & (df['dropoffdatetime'] < departmax)):
        df = df.drop[df[i.index]]
    return


for index, row in yankdf:
    time_function(yankdf, row)

Keep getting this syntax error:

 File "<ipython-input-25-bda6fb2db429>", line 17
    for not i in (((row['pickup_datetime'] > arrivemin) & (row['pickupdatetime'] < arrivemax)) | ((row['dropoff_datetime'] > departmin) & (row['dropoffdatetime'] < departmax)):
          ^
SyntaxError: invalid syntax

Upvotes: 1

Views: 797

Answers (1)

davidshinn
davidshinn

Reputation: 1946

I don't think you need the function. Just perform a basic subset and df_filtered should be your filtered dataframe.

gametimestart = df['Start.Time'] 
gametimeend = df['End.Time'] 
arrivemin = gametimestart - datetime.timedelta(minutes=120) 
arrivemax = gametimeend - datetime.timedelta(minutes = 30) 
departmin = gametimeend - datetime.timedelta(minutes = 60) 
departmax = gametimeend + datetime.timedelta(minutes = 90)
df_filtered = df[(df['pickup_datetime'] > arrivemin) &
                 (df['pickup_datetime'] < arrivemax) &
                 (df['dropoff_datetime'] > departmin) & 
                 (df['dropoffdatetime'] < departmax)]

Upvotes: 1

Related Questions