Drop Rows From Pandas Dataframe According To Date

Question

I'm trying to create a dataframe with pandas and drop dates later than say 201702. The dataframe is structured as so

    Date         Account Number
1   2019-02-21   123841234
2   2017-01-01   193741927
3   2015-03-04   981237432
4   2018-05-29   134913473
5   2012-05-12   138749173
6   2009-01-04   174917239

I'm reading in the csv (data.csv) and attempting to remove any date after 2017-02-28 as so:

data_csv = pd.read_csv('data.csv')
data_csv[data_csv.DATE < '2017-02-28']

Is this supposed to work correctly with a date format of YYYY-MM-DD or is there something I'd have to do to the column format to ensure that these rows are dropped from the dataframe?

Thank you for your time.

cs95 · Accepted Answer

If your date strings are in YYYY-MM-DD format then lexicographical comparisons work out of the box (for python in general, not just pandas).

'2009-01-04' < '2017-02-28'  
# True

'2019-01-04' < '2017-02-28'
# False

So your comparison should work out without any changes. Although it'd be safer to convert to datetime first, so your code makes no assumptions while still working.

df.dtypes     

Date              object
Account Number     int64
dtype: object

df[df['Date'] < '2017-02-28']

         Date  Account Number
2  2017-01-01       193741927
3  2015-03-04       981237432
5  2012-05-12       138749173
6  2009-01-04       174917239

df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
df.dtypes

Date              datetime64[ns]
Account Number             int64
dtype: object

df[df['Date'] < '2017-02-28']

        Date  Account Number
2 2017-01-01       193741927
3 2015-03-04       981237432
5 2012-05-12       138749173
6 2009-01-04       174917239

Drop Rows From Pandas Dataframe According To Date

Answers (2)

Related Questions