zsh_18
zsh_18

Reputation: 1202

How to fix the datetime format in a column with pandas

I am working with the below DateTime data:

DateTime
13/09/2018 6:16:58 PM
22/09/2018 5:16:07 PM
23/09/2018 10:52:54 AM
23/09/2018 3:01:12 AM
23/09/2018 5:47:44 AM
23/09/2018 7:06:44 PM
23/09/2018 8:46:25 AM
3/9/2018 16:00
3/9/2018 19:39
30/09/2018 9:52:35 PM
6/9/2018 15:49
6/9/2018 16:15
6/9/2018 16:41
6/9/2018 20:02
7/9/2018 13:49
7/9/2018 20:43
7/9/2018 22:26
7/9/2018 9:32
8/9/2018 1:12
8/9/2018 14:24
8/9/2018 16:18
9/9/2018 13:45

As it can be seen the format of DateTime in this column is mixed. I tried the following piece of code to standardize the format and extract some results out of that data:


#Converting the column into DateTime objects

pf['DateTime'] = pd.to_datetime(pf['DateTime'], errors = 'coerce')


# Set up a standard DateTime like format throughout the column
# pf is the data frame

pf['DateTime'].dt.strftime("YYYY-MM-DD hh:mm:ss")

# Grouping the DateTime column by Month
grp = pf.groupby(pf['DateTime'].dt.to_period('M'))

for d in grp:
    print('d',d)

But strangely, even after setting the format of the DateTime columns, the dates are being picked out as month and the groups I am getting are :

d (Period('2018-03', 'M')

d (Period('2018-06', 'M')

d (Period('2018-07', 'M')

d (Period('2018-08', 'M')

d (Period('2018-09', 'M')

The fact is the values 6,7,8 are the dates of month 9 and are mistakenly picked up as month. The expected outcome should be:

d (Period('2018-09', 'M')

How to set up the columns in such a way that it doesn't pick the date as a month.

Upvotes: 1

Views: 1592

Answers (1)

nidabdella
nidabdella

Reputation: 821

The problem was that the to_datetime function directly assumes that the format is %d/%m/%Y. The %d is handling a zero padded value for the days, so it doesn't recognize that 6/9/2018 means 6th of September 2018.

To handle that problem you need to use dayfirst of the to_datetime function. By doing so you get the following:

Without dayfirst:

pf['DateTime'] = pd.to_datetime(pf['DateTime'])
pf
Out[69]: 
              DateTime
0  2018-09-13 18:16:58
1  2018-09-22 17:16:07
2  2018-09-23 10:52:54
3  2018-09-23 03:01:12
4  2018-09-23 05:47:44
5  2018-09-23 19:06:44
6  2018-09-23 08:46:25
7  2018-03-09 16:00:00
8  2018-03-09 19:39:00
9  2018-09-30 21:52:35
10 2018-06-09 15:49:00
11 2018-06-09 16:15:00
12 2018-06-09 16:41:00
13 2018-06-09 20:02:00
14 2018-07-09 13:49:00
15 2018-07-09 20:43:00
16 2018-07-09 22:26:00
17 2018-07-09 09:32:00
18 2018-08-09 01:12:00
19 2018-08-09 14:24:00
20 2018-08-09 16:18:00
21 2018-09-09 13:45:00

With dayfirst:

    pf = pd.DataFrame(d)
pf['DateTime'] = pd.to_datetime(pf['DateTime'],  dayfirst=True)
pf
Out[72]: 
              DateTime
0  2018-09-13 18:16:58
1  2018-09-22 17:16:07
2  2018-09-23 10:52:54
3  2018-09-23 03:01:12
4  2018-09-23 05:47:44
5  2018-09-23 19:06:44
6  2018-09-23 08:46:25
7  2018-09-03 16:00:00
8  2018-09-03 19:39:00
9  2018-09-30 21:52:35
10 2018-09-06 15:49:00
11 2018-09-06 16:15:00
12 2018-09-06 16:41:00
13 2018-09-06 20:02:00
14 2018-09-07 13:49:00
15 2018-09-07 20:43:00
16 2018-09-07 22:26:00
17 2018-09-07 09:32:00
18 2018-09-08 01:12:00
19 2018-09-08 14:24:00
20 2018-09-08 16:18:00
21 2018-09-09 13:45:00

Upvotes: 1

Related Questions