Reputation: 1295
I was having trouble manipulating a time-series data provided to me for a project. The data contains the number of flight bookings made on a website per second in a duration of 30 minutes. Here is a part of the column containing the timestamp
>>> df['Date_time']
0 7/14/2017 2:14:14 PM
1 7/14/2017 2:14:37 PM
2 7/14/2017 2:14:38 PM
I wanted to do
>>> pd.set_index('Date_time')
and use the datetime and timedelta methods provided by pandas to generate the timestamp to be used as index to access and modify any value in any cell. Something like
>>> td=datetime(year=2017,month=7,day=14,hour=2,minute=14,second=36)
>>> td1=dt.timedelta(minutes=1,seconds=58)
>>> ti1=td1+td
>>> df.at[ti1,'column_name']=65000
But the timestamp generated is of the form
>>> print(ti1)
2017-07-14 02:16:34
Which cannot be directly used as an index in my case as can be clearly seen. Is there a workaround for the above case without writing additional methods myself?
I want to do the above as it provides me greater level of control over the data than looking for the default numerical index for each row I want to update and hence will prove more efficient accordig to me
Upvotes: 0
Views: 56
Reputation: 942
Can you check the dtype of the 'Date_time' column and confirm for me that it is string (object) ?
df.dtypes
If so, you should be able to cast the values to pd.Timestamp by using the following.
df['timestamp'] = df['Date_time'].apply(pd.Timestamp)
When we call .dtypes now, we should have a 'timestamp' field of type datetime64[ns], which allows us to use builtin pandas methods more easily.
I would suggest it is prudent to index the dataframe by the timestamp too, achieved by setting the index equal to that column.
df.set_index('timestamp', inplace=True)
We should now be able to use some more useful methods such as
df.loc[timestamp_to_check, :]
df.loc[start_time_stamp : end_timestamp, : ]
df.asof(timestamp_to_check)
to lookup values from the DataFrame based upon passing a datetime.datetime / pd.Timestamp / np.datetime64 into the above. Note that you will need to cast any string (object) 'lookups' to one of the above types in order to make use of the above correctly.
I prefer to use pd.Timestamp() - https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Timestamp.html to handle datetime conversion from strings unless I am explicitly certain of what format the datetime string is always going to be in.
Upvotes: 1