Reputation: 21513
In my dataframe
, the time is separated in 3 columns: year
, month
, day
, like this:
How can I convert them into date
, so I can do time series analysis?
I can do this:
df.apply(lambda x:'%s %s %s' % (x['year'],x['month'], x['day']),axis=1)
which gives:
1095 1954 1 1
1096 1954 1 2
1097 1954 1 3
1098 1954 1 4
1099 1954 1 5
1100 1954 1 6
1101 1954 1 7
1102 1954 1 8
1103 1954 1 9
1104 1954 1 10
1105 1954 1 11
1106 1954 1 12
1107 1954 1 13
But what follows?
EDIT: This is what I end up with:
from datetime import datetime
df['date']= df.apply(lambda x:datetime.strptime("{0} {1} {2}".format(x['year'],x['month'], x['day']), "%Y %m %d"),axis=1)
df.index= df['date']
Upvotes: 13
Views: 12601
Reputation: 3497
There is a simpler and much faster way to convert 3 columns with year, month and day to a single datetime column in pandas:
import pandas
pandas.to_datetime(df)
Besides the code being much simpler than the accepted answer, in my computer your implementation takes 22.3 seconds, while this one takes 175 milliseconds, with a 1 million rows dataframe. This implementation is 127x faster.
Note that in your case, the columns are already named year
, month
and day
, which is a requirement for the input dateframe of to_datetime
. If they have different names, you need to rename them first (e.g. df.rename(columns={'<your_year_col>': 'year', ...})
).
Upvotes: 2
Reputation: 12897
It makes no sense to format a date to a string and immediately reparse it; use the datetime
constructor instead:
df.apply(lambda x: datetime.date(x['year'], x['month'], x['day']), axis=1)
Upvotes: 4
Reputation: 1069
Here's how to convert value to time:
import datetime
df.apply(lambda x:datetime.strptime("{0} {1} {2} 00:00:00".format(x['year'],x['month'], x['day']), "%Y %m %d %H:%M:%S"),axis=1)
Upvotes: 9