Reputation: 1202
I am working with the below DateTime data:
DateTime
13/09/2018 6:16:58 PM
22/09/2018 5:16:07 PM
23/09/2018 10:52:54 AM
23/09/2018 3:01:12 AM
23/09/2018 5:47:44 AM
23/09/2018 7:06:44 PM
23/09/2018 8:46:25 AM
3/9/2018 16:00
3/9/2018 19:39
30/09/2018 9:52:35 PM
6/9/2018 15:49
6/9/2018 16:15
6/9/2018 16:41
6/9/2018 20:02
7/9/2018 13:49
7/9/2018 20:43
7/9/2018 22:26
7/9/2018 9:32
8/9/2018 1:12
8/9/2018 14:24
8/9/2018 16:18
9/9/2018 13:45
As it can be seen the format of DateTime in this column is mixed. I tried the following piece of code to standardize the format and extract some results out of that data:
#Converting the column into DateTime objects
pf['DateTime'] = pd.to_datetime(pf['DateTime'], errors = 'coerce')
# Set up a standard DateTime like format throughout the column
# pf is the data frame
pf['DateTime'].dt.strftime("YYYY-MM-DD hh:mm:ss")
# Grouping the DateTime column by Month
grp = pf.groupby(pf['DateTime'].dt.to_period('M'))
for d in grp:
print('d',d)
But strangely, even after setting the format of the DateTime columns, the dates are being picked out as month and the groups I am getting are :
d (Period('2018-03', 'M')
d (Period('2018-06', 'M')
d (Period('2018-07', 'M')
d (Period('2018-08', 'M')
d (Period('2018-09', 'M')
The fact is the values 6,7,8 are the dates of month 9 and are mistakenly picked up as month. The expected outcome should be:
d (Period('2018-09', 'M')
How to set up the columns in such a way that it doesn't pick the date as a month.
Upvotes: 1
Views: 1592
Reputation: 821
The problem was that the to_datetime
function directly assumes that the format is %d/%m/%Y
. The %d
is handling a zero padded value for the days, so it doesn't recognize that 6/9/2018
means 6th of September 2018.
To handle that problem you need to use dayfirst
of the to_datetime
function. By doing so you get the following:
Without dayfirst
:
pf['DateTime'] = pd.to_datetime(pf['DateTime'])
pf
Out[69]:
DateTime
0 2018-09-13 18:16:58
1 2018-09-22 17:16:07
2 2018-09-23 10:52:54
3 2018-09-23 03:01:12
4 2018-09-23 05:47:44
5 2018-09-23 19:06:44
6 2018-09-23 08:46:25
7 2018-03-09 16:00:00
8 2018-03-09 19:39:00
9 2018-09-30 21:52:35
10 2018-06-09 15:49:00
11 2018-06-09 16:15:00
12 2018-06-09 16:41:00
13 2018-06-09 20:02:00
14 2018-07-09 13:49:00
15 2018-07-09 20:43:00
16 2018-07-09 22:26:00
17 2018-07-09 09:32:00
18 2018-08-09 01:12:00
19 2018-08-09 14:24:00
20 2018-08-09 16:18:00
21 2018-09-09 13:45:00
With dayfirst
:
pf = pd.DataFrame(d)
pf['DateTime'] = pd.to_datetime(pf['DateTime'], dayfirst=True)
pf
Out[72]:
DateTime
0 2018-09-13 18:16:58
1 2018-09-22 17:16:07
2 2018-09-23 10:52:54
3 2018-09-23 03:01:12
4 2018-09-23 05:47:44
5 2018-09-23 19:06:44
6 2018-09-23 08:46:25
7 2018-09-03 16:00:00
8 2018-09-03 19:39:00
9 2018-09-30 21:52:35
10 2018-09-06 15:49:00
11 2018-09-06 16:15:00
12 2018-09-06 16:41:00
13 2018-09-06 20:02:00
14 2018-09-07 13:49:00
15 2018-09-07 20:43:00
16 2018-09-07 22:26:00
17 2018-09-07 09:32:00
18 2018-09-08 01:12:00
19 2018-09-08 14:24:00
20 2018-09-08 16:18:00
21 2018-09-09 13:45:00
Upvotes: 1