Anubhav Pandey
Anubhav Pandey

Reputation: 1295

Difference in the way of representation of timestamp in the provided dataset and the one generated by pandas.datetime

I was having trouble manipulating a time-series data provided to me for a project. The data contains the number of flight bookings made on a website per second in a duration of 30 minutes. Here is a part of the column containing the timestamp

>>> df['Date_time']

 0     7/14/2017 2:14:14 PM           
 1     7/14/2017 2:14:37 PM           
 2     7/14/2017 2:14:38 PM

I wanted to do

>>> pd.set_index('Date_time')

and use the datetime and timedelta methods provided by pandas to generate the timestamp to be used as index to access and modify any value in any cell. Something like

>>> td=datetime(year=2017,month=7,day=14,hour=2,minute=14,second=36)
>>> td1=dt.timedelta(minutes=1,seconds=58)
>>> ti1=td1+td
>>> df.at[ti1,'column_name']=65000

But the timestamp generated is of the form

>>> print(ti1)
2017-07-14 02:16:34

Which cannot be directly used as an index in my case as can be clearly seen. Is there a workaround for the above case without writing additional methods myself?

I want to do the above as it provides me greater level of control over the data than looking for the default numerical index for each row I want to update and hence will prove more efficient accordig to me

Upvotes: 0

Views: 56

Answers (1)

emmet02
emmet02

Reputation: 942

Can you check the dtype of the 'Date_time' column and confirm for me that it is string (object) ?

df.dtypes

If so, you should be able to cast the values to pd.Timestamp by using the following.

df['timestamp'] = df['Date_time'].apply(pd.Timestamp)

When we call .dtypes now, we should have a 'timestamp' field of type datetime64[ns], which allows us to use builtin pandas methods more easily.

I would suggest it is prudent to index the dataframe by the timestamp too, achieved by setting the index equal to that column.

df.set_index('timestamp', inplace=True)

We should now be able to use some more useful methods such as

df.loc[timestamp_to_check, :]
df.loc[start_time_stamp : end_timestamp, : ]
df.asof(timestamp_to_check)

to lookup values from the DataFrame based upon passing a datetime.datetime / pd.Timestamp / np.datetime64 into the above. Note that you will need to cast any string (object) 'lookups' to one of the above types in order to make use of the above correctly.

I prefer to use pd.Timestamp() - https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Timestamp.html to handle datetime conversion from strings unless I am explicitly certain of what format the datetime string is always going to be in.

Upvotes: 1

Related Questions