Reputation: 814
I have some data that looks like:
key DATE - DAY DATE - MONTH DATE - YEAR GMT HRS GMT MINUTES
1 2 29 2 2016 2 2
2 3 29 2 2016 2 2
3 4 29 2 2016 2 2
4 5 29 2 2016 2 2
5 6 29 2 2016 2 2
6 7 29 2 2016 2 2
7 8 29 2 2016 2 3
8 9 29 2 2016 2 3
9 10 29 2 2016 2 3
GMT SECONDS
1 54
2 55
3 56
4 57
5 58
6 59
7 0
8 1
9 2
At first the data was type float
and the year was in format 16
so I did:
t['DATE - MONTH'] = t['DATE - MONTH'].astype(int)
t['DATE - YEAR'] = t['DATE - YEAR'].astype(int)
t['DATE - YEAR'] = t['DATE - YEAR']+2000
t['DATE - DAY'] = t['DATE - DAY'].astype(int)
^Note I was also confused why when using an index number rather than the column name you only work on what seems to be a temp table ie you can print the desired result but it didnt change the data frame.
Then I tried two methods:
t['Date'] = pd.to_datetime(dict(year=t['DATE - YEAR'], month = t['DATE - MONTH'], day = t['DATE - DAY']))
t['Date'] = pd.to_datetime((t['DATE - YEAR']*10000+t['DATE - MONTH']*100+t['DATE - DAY']).apply(str),format='%Y%m%d')
Both return:
ValueError: cannot assemble the datetimes: time data 20000000 does not match format '%Y%m%d' (match)
I'd like to create a date column (and then after use a similar logic for a datetime column with the additional 3 columns).
What is the problem?
EDIT: I had bad data and added errors='coerce'
to handle those rows
Upvotes: 2
Views: 3553
Reputation: 862581
First rename all columns, filter by values of dict and use to_datetime
:
Assembling a datetime from multiple columns of a DataFrame. The keys can be common abbreviations like ['year', 'month', 'day', 'minute', 'second', 'ms', 'us', 'ns']) or plurals of the same.
d = {'DATE - YEAR':'year','DATE - MONTH':'month','DATE - DAY':'day',
'GMT HRS':'hour','GMT MINUTES':'minute','GMT SECONDS':'second'}
df['datetime'] = pd.to_datetime(df.rename(columns=d)[list(d.values())])
print (df)
key DATE - DAY DATE - MONTH DATE - YEAR GMT HRS GMT MINUTES \
1 2 29 2 2016 2 2
2 3 29 2 2016 2 2
3 4 29 2 2016 2 2
4 5 29 2 2016 2 2
5 6 29 2 2016 2 2
6 7 29 2 2016 2 2
7 8 29 2 2016 2 3
8 9 29 2 2016 2 3
9 10 29 2 2016 2 3
GMT SECONDS datetime
1 54 2016-02-29 02:02:54
2 55 2016-02-29 02:02:55
3 56 2016-02-29 02:02:56
4 57 2016-02-29 02:02:57
5 58 2016-02-29 02:02:58
6 59 2016-02-29 02:02:59
7 0 2016-02-29 02:03:00
8 1 2016-02-29 02:03:01
9 2 2016-02-29 02:03:02
Detail:
print (df.rename(columns=d)[list(d.values())])
day month second year minute hour
1 29 2 54 2016 2 2
2 29 2 55 2016 2 2
3 29 2 56 2016 2 2
4 29 2 57 2016 2 2
5 29 2 58 2016 2 2
6 29 2 59 2016 2 2
7 29 2 0 2016 3 2
8 29 2 1 2016 3 2
9 29 2 2 2016 3 2
Upvotes: 3