JMV12
JMV12

Reputation: 1035

Drop Rows From Pandas Dataframe According To Date

I'm trying to create a dataframe with pandas and drop dates later than say 201702. The dataframe is structured as so

    Date         Account Number
1   2019-02-21   123841234
2   2017-01-01   193741927
3   2015-03-04   981237432
4   2018-05-29   134913473
5   2012-05-12   138749173
6   2009-01-04   174917239

I'm reading in the csv (data.csv) and attempting to remove any date after 2017-02-28 as so:

data_csv = pd.read_csv('data.csv')
data_csv[data_csv.DATE < '2017-02-28']

Is this supposed to work correctly with a date format of YYYY-MM-DD or is there something I'd have to do to the column format to ensure that these rows are dropped from the dataframe?

Thank you for your time.

Upvotes: 3

Views: 11031

Answers (2)

cs95
cs95

Reputation: 402263

If your date strings are in YYYY-MM-DD format then lexicographical comparisons work out of the box (for python in general, not just pandas).

'2009-01-04' < '2017-02-28'  
# True

'2019-01-04' < '2017-02-28'
# False

So your comparison should work out without any changes. Although it'd be safer to convert to datetime first, so your code makes no assumptions while still working.


df.dtypes     

Date              object
Account Number     int64
dtype: object

df[df['Date'] < '2017-02-28']

         Date  Account Number
2  2017-01-01       193741927
3  2015-03-04       981237432
5  2012-05-12       138749173
6  2009-01-04       174917239

df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
df.dtypes

Date              datetime64[ns]
Account Number             int64
dtype: object

df[df['Date'] < '2017-02-28']

        Date  Account Number
2 2017-01-01       193741927
3 2015-03-04       981237432
5 2012-05-12       138749173
6 2009-01-04       174917239

Upvotes: 3

Dani Mesejo
Dani Mesejo

Reputation: 61910

I suggest you transform the string into a Timestamp, assuming data_csv.Date is also a Timestamp:

result = data_csv[data_csv.Date < pd.to_datetime('2017-02-28')]
print(result)

Output

        Date  Account Number
1 2017-01-01       193741927
2 2015-03-04       981237432
4 2012-05-12       138749173
5 2009-01-04       174917239

Upvotes: 2

Related Questions