Reputation: 11
I have a problem loading a dataframe from csv when I have a multiindex with more than one date in it.
I am running the following code:
import pandas as pd
import datetime
date1 = datetime.date.today()
date2 = datetime.date.today().replace(month=1)
date_cols=['date1', 'date2']
index = pd.MultiIndex.from_product([[date1],[date2]])
#create dataframe with a single row
df= pd.DataFrame([{'date1':date1, 'date2':date2, 'a':1, 'b':2}])
df.set_index(date_cols, inplace=True)
#print the single row -> correct
print df.loc[index]
# write to csv and load it again
df.to_csv('df.csv')
dfr = pd.read_csv('df.csv', parse_dates=date_cols, dayfirst=True)
dfr.set_index(date_cols, inplace=True)
# print the single row -> incorrect, shows nan,
print dfr.loc[index]
Whilst I expect to get the same output, i.e. the single row in the dataframe, the second print statement prints out nan, because the index is not in the dataframe. When running df.index, I see that the multiindex object contains the two dates, but now also holds time information, where the time is 00:00:00
Is this a bug?
Upvotes: 1
Views: 593
Reputation: 129018
What you are doing is subtlely different.
In [31]: df.index.levels[0]
Out[31]: Index([2014-07-31], dtype='object')
In [32]: dfr.index.levels[0]
Out[32]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2014-07-31]
Length: 1, Freq: None, Timezone: None
The initial creation (using MultiIndex.from_product
is using datetimes
. In a multi-index creation I suppose this could cause an automatic DatetimeIndex creation, rather that a plain Index
of datetimes
.
When reading it back in a proper DatetimeIndex
is created. I'll open an issue to think about this. See here
Upvotes: 1