Reputation: 2330
I have a pandas dataframe:
lat lng alt days date time
0 40.003834 116.321462 211 39745.175405 2008-10-24 04:12:35
1 40.003783 116.321431 201 39745.175463 2008-10-24 04:12:40
2 40.003690 116.321429 203 39745.175521 2008-10-24 04:12:45
3 40.003589 116.321427 194 39745.175579 2008-10-24 04:12:50
4 40.003522 116.321412 190 39745.175637 2008-10-24 04:12:55
5 40.003509 116.321484 188 39745.175694 2008-10-24 04:13:00
For which I am trying to convert the df['date'] and df['time'] columns into a datetime. I can do:
df['Datetime'] = pd.to_datetime(df['date']+df['time'])
df = df.set_index(['Datetime'])
del df['date']
del df['time']
And I get:
lat lng alt days
Datetime
2008-10-2404:12:35 40.003834 116.321462 211 39745.175405
2008-10-2404:12:40 40.003783 116.321431 201 39745.175463
2008-10-2404:12:45 40.003690 116.321429 203 39745.175521
2008-10-2404:12:50 40.003589 116.321427 194 39745.175579
2008-10-2404:12:55 40.003522 116.321412 190 39745.175637
But then if I try:
df.between_time(time(1),time(22,59,59))['lng'].std()
I get an error - 'TypeError: Index must be DatetimeIndex'
So, I've also tried setting the DatetimeIndex:
df['Datetime'] = pd.to_datetime(df['date']+df['time'])
#df = df.set_index(['Datetime'])
df = df.set_index(pd.DatetimeIndex(df['Datetime']))
del df['date']
del df['time']
And this throws an error also - 'DateParseError: unknown string format'
How do I create the datetime column and DatetimeIndex correctly so that df.between_time() works right?
Upvotes: 105
Views: 316428
Reputation: 21
I had trouble with setting a column formatted as YYYY-MM-DD as a date time index column in a data frame I needed for time series forecasting. This is how I solved it for a dateframe where I wanted "dateCol" to be the datetime index:
idx = pd.DatetimeIndex(self.df[dateCol])
self.df = self.df.set_index(idx)
Then to drop the column so it's not duplicated in the dataframe
self.df = self.df.drop(dateCol, axis=1)
Upvotes: 1
Reputation: 429
You may also want to set inplace=True
. This way it returns the same df
df["datetime"] = pd.to_datetime(df["date"] + " " + df["time"], format = "%Y-%m-%d %H:%M:%S")
df.set_index(["datetime"], inplace=True)
Upvotes: 4
Reputation: 91
This worked best for me:
format = '%Y-%m-%d%H:%M:%S'
df['Datetime'] = pd.to_datetime(df['date'] + df['time'].astype("string"), format=format)
In some cases Python treats df['date']
as column of integers.
Upvotes: 1
Reputation: 1728
To simplify Kirubaharan's answer a bit:
df['Datetime'] = pd.to_datetime(df['date'] + ' ' + df['time'])
df = df.set_index('Datetime')
And to get rid of unwanted columns (as OP did but did not specify per se in the question):
df = df.drop(['date','time'], axis=1)
Upvotes: 138
Reputation: 2365
You are not creating datetime index properly,
format = '%Y-%m-%d %H:%M:%S'
df['Datetime'] = pd.to_datetime(df['date'] + ' ' + df['time'], format=format)
df = df.set_index(pd.DatetimeIndex(df['Datetime']))
Upvotes: 88