jenryb
jenryb

Reputation: 2117

Ignoring future dates in python

I have a large database and I am looking to read only the last week for my python code.

However, somebody made a typo in the database so there is a date in the future that is throwing everything off.

Input:

recvd_dttm
6/5/2015 18:28:50 PM
6/5/2015 14:25:43 PM
9/10/2015 21:45:12 PM
6/5/2015 14:30:43 PM
6/5/2015 14:32:33 PM
6/5/2015 14:33:45 PM

Code so far:

import datetime as datetime

#Create a dataframe with the data we are interested in
df1 =pd.read_csv('MYDATA.csv')

#This section selects the last week of data
# convert strings to datetimes
df1['recvd_dttm'] = pd.to_datetime(df1['recvd_dttm'])


# get first and last datetime for final week of data   
range_max = df1['recvd_dttm'].max()
range_min = range_max - datetime.timedelta(days=7)

# take slice with final week of data
df2 = df1[(df1['recvd_dttm'] >= range_min) & 
               (df1['recvd_dttm'] <= range_max)]

I want to ignore all dates in the future. I have tried doing a try: except: IndexError approach, but this didn't work, as the IndexError flag was only thrown later in the code.

I have tried an if loop

if df1['recvd_dttm'].max() > datetime.datetime.now():

but these values aren't comparable, and I don't know how to select the penultimate value for the date, as max()-1 doesn't work, obviously. Does anyone have any ideas? Thanks in advance!

Upvotes: 1

Views: 789

Answers (2)

abeboparebop
abeboparebop

Reputation: 7755

I believe your issue is that to_datetime isn't working the way you expect it to. You need to tell it the specific date format to expect.

import datetime as datetime
import pandas as pd

# prepare the dataframe
dates = ['6/5/2015 18:28:50 PM', '6/5/2015 14:25:43 PM', '9/10/2015 21:45:12 PM', '6/5/2015 14:30:43 PM', '6/5/2015 14:32:33 PM', '6/5/2015 14:33:45 PM']
df1 = pd.DataFrame({"recvd_dttm": dates})

# properly convert dates
df1['recvd_dttm'] = pd.to_datetime(df1['recvd_dttm'], format='%m/%d/%Y %H:%M:%S %p')

# drop rows with dates in the future
df1 = df1[df1['recvd_dttm'] < datetime.datetime.now()]

Upvotes: 0

unutbu
unutbu

Reputation: 879601

You could use

mask = df1['recvd_dttm'] <= datetime.datetime.now()
df1 = df1.loc[mask]

to select only those rows for which recvd_dttm is less than current datetime.

Upvotes: 1

Related Questions