user3895587
user3895587

Reputation: 11

Pandas Multiindex not working with read_csv and datetime objects

I have a problem loading a dataframe from csv when I have a multiindex with more than one date in it.

I am running the following code:

import pandas as pd
import datetime
date1 = datetime.date.today()
date2 = datetime.date.today().replace(month=1)
date_cols=['date1', 'date2']
index = pd.MultiIndex.from_product([[date1],[date2]])

#create dataframe with a single row
df= pd.DataFrame([{'date1':date1, 'date2':date2, 'a':1, 'b':2}])
df.set_index(date_cols, inplace=True)
#print the single row -> correct
print df.loc[index]

# write to csv and load it again
df.to_csv('df.csv')
dfr = pd.read_csv('df.csv', parse_dates=date_cols, dayfirst=True)
dfr.set_index(date_cols, inplace=True)
# print the single row -> incorrect, shows nan,
print dfr.loc[index]

Whilst I expect to get the same output, i.e. the single row in the dataframe, the second print statement prints out nan, because the index is not in the dataframe. When running df.index, I see that the multiindex object contains the two dates, but now also holds time information, where the time is 00:00:00

Is this a bug?

Upvotes: 1

Views: 593

Answers (1)

Jeff
Jeff

Reputation: 129018

What you are doing is subtlely different.

In [31]: df.index.levels[0]
Out[31]: Index([2014-07-31], dtype='object')

In [32]: dfr.index.levels[0]
Out[32]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2014-07-31]
Length: 1, Freq: None, Timezone: None

The initial creation (using MultiIndex.from_product is using datetimes. In a multi-index creation I suppose this could cause an automatic DatetimeIndex creation, rather that a plain Index of datetimes.

When reading it back in a proper DatetimeIndex is created. I'll open an issue to think about this. See here

Upvotes: 1

Related Questions