Reputation: 3375
I have a csv file like this and this is the code I wrote to filter the date
example['date_1'] = pd.to_datetime(example['date_1'])
example['date_2'] = pd.to_datetime(example['date_2'])
example
date_1 ID date_2
2015-01-12 111 2016-01-20 08:34:00
2016-01-11 222 2016-12-15 08:34:00
2016-01-11 7770 2016-12-15 08:34:00
2016-01-10 7881 2016-11-17 08:32:00
2016-01-03 90243 2016-04-14 08:35:00
2016-01-03 90354 2016-04-14 08:35:00
2015-01-11 1140303 2015-12-15 08:43:00
2015-01-11 1140414 2015-12-15 08:43:00
example[(example['date_1'] <= '2016-11-01')
& (example['date_1'] >= '2015-11-01')
& (example['date_2'] <= '2016-12-16')
& (example['date_2'] >= '2015-12-15')]
Output:
2016-01-11 222 2016-12-15 08:34:00
2016-01-11 7770 2016-12-15 08:34:00
2016-01-10 7881 2016-11-17 08:32:00
2016-01-03 90243 2016-04-14 08:35:00
2016-01-03 90354 2016-04-14 08:35:00
I don't understand why it changes the format of the date, and it seems like it mix up the month&day in the date, with the conditional filter, the expected result should be the same with the original dataset, but it erased several lines? Can someone help me with it, many thanks.
Upvotes: 0
Views: 70
Reputation: 149185
Some locales format the date as dd/mm/YYYY, while others use mm/dd/YYYY. By default pandas uses the american format of mm/dd/YYYY unless it can infer the alternate format from the values (when a day number is greater than 12...).
So if you know that you input date format is dd/mm/YYYY, you must say it to pandas:
example['date_1'] = pd.to_datetime(example['date_1'], dayfirst=True)
example['date_2'] = pd.to_datetime(example['date_2'], dayfirst=True)
Once pandas has a Timestamp column, it internally stores a number of nano seconds from 1970-01-01 00:00, and by default displays it according to ISO-8601, striping parts that are 0 for the columns. Parts being the full time, fractions of seconds or nanoseconds.
You should not care if you want to process the Timestamps. If at the end you want to force a format, explicitely change the column to its string representation:
df['date_1'] = df['date_1'].df.strftime('%d/%m/%Y %H:%M')
Upvotes: 1