Reputation: 2248
As per title, pandas is force converting datetime.date types into pd.tslib.Timestamp types when I do a set_index, but ONLY if there is more than one column in the index. This makes working with and merging different frames a problem since some end up with Timestamps and others stay as datetime.dates. Super simple example:
df = pd.DataFrame({'date':[datetime.date(2021,3,3),datetime.date(2021,3,4)],'player':['a','b'],'score':[10,9]})
print(type(df['date'][0]))
<class 'datetime.date'>
df = df.set_index('date')
print(type(df.index.get_level_values('date')[0]))
<class 'datetime.date'>
df = df.reset_index()
print(type(df['date'][0]))
<class 'datetime.date'>
df = df.set_index(['date','player'])
print(type(df.index.get_level_values('date')[0]))
<class 'pandas.tslib.Timestamp'>
df = df.reset_index()
print(type(df['date'][0]))
<class 'pandas.tslib.Timestamp'>
How can I keep them in datetime.date?
[note: pd.version == '0.19.2' due legacy code, if it is relevant]
Upvotes: 1
Views: 59
Reputation: 863226
I think it is bug, in my opinion.
You can use MultiIndex.set_levels
with set to dates
by DatetimeIndex.date
:
df = df.set_index(['date','player'])
df.index = df.index.set_levels(df.index.levels[0].date, level=0)
print(type(df.index.get_level_values('date')[0]))
<class 'datetime.date'>
df = df.reset_index()
print(type(df['date'][0]))
<class 'datetime.date'>
Upvotes: 1
Reputation: 2248
I have found a workaround, but I do hope a better answer can be offered, since this is a pretty inefficient way and takes quite a few lines.
Workaround:
df = pd.DataFrame({'date':[datetime.date(2021,3,3),datetime.date(2021,3,4)],'player':['a','b'],'score':[10,9]})
df = df.set_index(['date','player'])
df = df.unstack()
df.index = [d.date() for d in df.index.to_pydatetime()]
df.index.names = ['date']
df = df.stack()
print(type(df.index.get_level_values('date')[0]))
<class 'datetime.date'>
Upvotes: 0