Reputation: 1352
I am having some issue in formatting the date and time. I have data file that contains date and time. Below is the sample date that represents part of my data.
data = pd.DataFrame()
data['Date'] = ['01 Jul 2014 - Qualification','30 Sep 2014 - Group Stage','17 Mar 2015 - Play Offs',' 19:00:00']
data ['ID'] = [1,2,3,4]
I created a new columns and tried to format using datetime as follow:
data['date1'] = pd.to_datetime(data.Date,errors = 'coerce')
I got all NaT in date time. I also wanted to create two new columns such as Time column and stage to represent the time and the game stage.
How can I proceed with the issue?
Upvotes: 0
Views: 317
Reputation: 862661
You can use regex here with Series.str.extract
:
#https://stackoverflow.com/a/47656743
pat = r'(\d+/\d+(?:/\d+)?|(?:\d+ )?(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|June?|July?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)[.,]?(?:-\d+-\d+| \d+(?:th|rd|st|nd)?,? \d+| \d+)|\d{4})'
#https://stackoverflow.com/a/46069885
pat = r'((?:\d{,2}\s)?(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*(?:-|\.|\s|,)\s?\d{,2}[a-z]*(?:-|,|\s)?\s?\d{2,4})'
s = data['Date'].str.extract(pat, expand=False)
data['date1'] = pd.to_datetime(s, errors = 'coerce')
print (data)
Date ID date1
0 01 Jul 2014 - Qualification 1 2014-07-01
1 30 Sep 2014 - Group Stage 2 2014-09-30
2 17 Mar 2015 - Play Offs 3 2015-03-17
3 19:00:00 4 NaT
Upvotes: 1
Reputation: 5463
The Date
column has text that is other than just date/time. You cannot convert it to datetime object as it is. You need to isolate the date/time part of the text from the rest of it. To do this, you can split on -
and expand to get the Stage text and date in separate columns of a temp dataframe df_temp
and then use these columns to assign & create each in your existing dataframe:
In [27]: df_temp = data['Date'].str.split('-', expand=True)
In [28]: data['date1'] = df_temp[0]
In [29]: data['stage'] = df_temp[1]
In [30]: data
Out[30]:
Date ID date1 stage
0 01 Jul 2014 - Qualification 1 01 Jul 2014 Qualification
1 30 Sep 2014 - Group Stage 2 30 Sep 2014 Group Stage
2 17 Mar 2015 - Play Offs 3 17 Mar 2015 Play Offs
3 19:00:00 4 19:00:00 None
In [31]: data['date1'] = pd.to_datetime(data.date1,errors = 'coerce')
In [32]: data
Out[32]:
Date ID date1 stage
0 01 Jul 2014 - Qualification 1 2014-07-01 Qualification
1 30 Sep 2014 - Group Stage 2 2014-09-30 Group Stage
2 17 Mar 2015 - Play Offs 3 2015-03-17 Play Offs
3 19:00:00 4 NaT None
Upvotes: 1