Chris
Chris

Reputation: 3269

Capturing Datetime Objects in Pandas Dataframe

If I'm reading the docs correctly for Pandas 0.13.1, read_csv should yield columns of datetimes when parse_dates = [<col1>,<col2>...] is invoked during the read. What I'm getting instead is columns of Timestamp objects. Even with the application of .to_datetime, I still end up with Timestamp objects. What am I missing here? How can I read the strings and convert straight to datetime objects that are stored in the dataframe? It seems as if the datetime objects are getting converted to Timestamps in the dataframe.

df = read_csv('Beijing_2010_HourlyPM2.5_created20140325.csv',parse_dates=['Date (LST)'])

df['Date (LST)'][0] yields
Timestamp('2010-01-01 23:00:00', tz=None)

df['Date (LST)'] = pd.to_datetime(df['Date (LST)'])

df['Date (LST)'][0] still yields
Timestamp('2010-01-01 23:00:00', tz=None)

Upvotes: 2

Views: 1333

Answers (1)

Andy Hayden
Andy Hayden

Reputation: 375375

Timestamps are the way that pandas deals with datetime, you can move between Timestamp, datetime64 and datetime, but most of the time using Timestamp is what you want (and pandas just converts it for you by default).

Note: Timestamp is really just an int64 column of epoch nanoseconds i.e. the same as numpy datetime64 ns (which you'll see is the dtype of Timestamp columns).

If you must force a column of dates you can use the to_pydatetime method, and force it to a Series not be converted by assigning the object dtype, however this will be both slower and use more space than just using Timestamps (because datetimes are essentially tuples and Timestamps are int64).

Upvotes: 3

Related Questions