Reputation: 1352
I have been trying different ways to handle date time in Pandas from CSV.
I have 3 columns in csv file:
The first two columns are well formatted. But the format of third column "time" is mixed up.. some represents as time, some represent as date time.
For example:
12:00:00 AM
1/1/1900 9:04:00 PM
How do I make same format using pandas?
Second thing is that I want to add all three columns to get a event time.
For example:
kickoffDate =['8/6/2017','8/6/2017','8/6/2017']
kickoffTime =['15:00:00','15:00:00','15:00:00']
time =['51:48:00','86:05:00','10:04']`
time in this case is mm:ss:00
format. i want to combine those three column to create a new column so called eventdatetime
:
eventdatetime = [06-08-2017 15:51:48, 06-08-2017 16:26:05,06-08-2017 15:10:04]`
How can i do this? I am able to combine the first two by using following formula:
DateTime1 = data['kickoffDate']+' '+ data ['kickoffTime']
The original csv file can be downloaded from below link:
https://drive.google.com/open?id=1JL65x7nq2m6zk4qnaRUDKL894aEdXW_B
Upvotes: 0
Views: 235
Reputation: 863166
You can use parameter parse_dates
with first and second columns for datetimes
, then convert last column to_timedelta
with adding 00:
for hours for values with no one :
:
df = pd.read_csv('Datetimetest.csv', parse_dates=[[0, 1]])
m = df['time'].str.count(':') != 1
df['time'] = pd.to_timedelta('00:' + df['time'].mask(m, df['time'].str.replace(':00$', '')))
df['eventdatetime'] = df['kickoffDate_kickoffTime'] + df['time']
print (df.head())
kickoffDate_kickoffTime time eventdatetime
0 2018-04-30 19:00:00 00:47:36 2018-04-30 19:47:36
1 2018-04-30 19:00:00 00:15:28 2018-04-30 19:15:28
2 2018-04-29 13:15:00 00:52:03 2018-04-29 14:07:03
3 2018-04-29 13:15:00 01:03:42 2018-04-29 14:18:42
4 2018-04-29 13:15:00 00:10:43 2018-04-29 13:25:43
Another solution if want 4 separately columns in output - convert to datetimes only first column and another to timedelta
s:
df = pd.read_csv('Datetimetest.csv', parse_dates=[0])
m = df['time'].str.count(':') != 1
df['time'] = pd.to_timedelta('00:' + df['time'].mask(m, df['time'].str.replace(':00$', '')))
df['kickoffTime'] = pd.to_timedelta(df['kickoffTime'])
df['eventdatetime'] = df['kickoffDate'] + df['kickoffTime'] + df['time']
print (df.head())
kickoffDate kickoffTime time eventdatetime
0 2018-04-30 19:00:00 00:47:36 2018-04-30 19:47:36
1 2018-04-30 19:00:00 00:15:28 2018-04-30 19:15:28
2 2018-04-29 13:15:00 00:52:03 2018-04-29 14:07:03
3 2018-04-29 13:15:00 01:03:42 2018-04-29 14:18:42
4 2018-04-29 13:15:00 00:10:43 2018-04-29 13:25:43
EDIT:
If input data is not csv
, for convert first column to datetime is possible use to_datetime
instead parameter parse_dates
in read_csv
:
df = pd.read_csv('Datetimetest.csv')
m = df['time'].str.count(':') != 1
df['time'] = pd.to_timedelta('00:' + df['time'].mask(m, df['time'].str.replace(':00$', '')))
df['kickoffDate'] = pd.to_datetime(df['kickoffDate'])
df['kickoffTime'] = pd.to_timedelta(df['kickoffTime'])
df['eventdatetime'] = df['kickoffDate'] + df['kickoffTime'] + df['time']
print (df.head())
kickoffDate kickoffTime time eventdatetime
0 2018-04-30 19:00:00 00:47:36 2018-04-30 19:47:36
1 2018-04-30 19:00:00 00:15:28 2018-04-30 19:15:28
2 2018-04-29 13:15:00 00:52:03 2018-04-29 14:07:03
3 2018-04-29 13:15:00 01:03:42 2018-04-29 14:18:42
4 2018-04-29 13:15:00 00:10:43 2018-04-29 13:25:43
Upvotes: 1