tim654321
tim654321

Reputation: 2248

When calling set_index with more than one column, datetime.date values are converted to pd.tslib.Timestamps

As per title, pandas is force converting datetime.date types into pd.tslib.Timestamp types when I do a set_index, but ONLY if there is more than one column in the index. This makes working with and merging different frames a problem since some end up with Timestamps and others stay as datetime.dates. Super simple example:

df = pd.DataFrame({'date':[datetime.date(2021,3,3),datetime.date(2021,3,4)],'player':['a','b'],'score':[10,9]})

print(type(df['date'][0]))
<class 'datetime.date'>

df = df.set_index('date')

print(type(df.index.get_level_values('date')[0]))
<class 'datetime.date'>

df = df.reset_index()

print(type(df['date'][0]))
<class 'datetime.date'>

df = df.set_index(['date','player'])

print(type(df.index.get_level_values('date')[0]))
<class 'pandas.tslib.Timestamp'>

df = df.reset_index()

print(type(df['date'][0]))
<class 'pandas.tslib.Timestamp'>

How can I keep them in datetime.date?

[note: pd.version == '0.19.2' due legacy code, if it is relevant]

Upvotes: 1

Views: 59

Answers (2)

jezrael
jezrael

Reputation: 863226

I think it is bug, in my opinion.

You can use MultiIndex.set_levels with set to dates by DatetimeIndex.date:

df = df.set_index(['date','player'])

df.index = df.index.set_levels(df.index.levels[0].date, level=0)

print(type(df.index.get_level_values('date')[0]))
<class 'datetime.date'>

df = df.reset_index()

print(type(df['date'][0]))
<class 'datetime.date'>

Upvotes: 1

tim654321
tim654321

Reputation: 2248

I have found a workaround, but I do hope a better answer can be offered, since this is a pretty inefficient way and takes quite a few lines.

Workaround:

  1. set the multiindex as normal
  2. Unstack until only your date col remains as the index
  3. Convert the values back to datetime.dates and set the index via a list, and give the index its name again
  4. Stack the df back to its original layout
df = pd.DataFrame({'date':[datetime.date(2021,3,3),datetime.date(2021,3,4)],'player':['a','b'],'score':[10,9]})
df = df.set_index(['date','player'])
df = df.unstack()
df.index = [d.date() for d in df.index.to_pydatetime()]
df.index.names = ['date']
df = df.stack()
print(type(df.index.get_level_values('date')[0]))
<class 'datetime.date'>

Upvotes: 0

Related Questions