Reputation: 139
I have a pandasDataFrame
containing a datetime.date
column. When I set a multilevel index, the date column is converted to a datetime.datetime
object, which does not happen when setting a single-level index. Is this normal behavior? How can I define a multilevel index keeping the date
type?
import datetime
import pandas as pd
values = [("a", datetime.date(2015,1,1), 30.),
("a", datetime.date(2015,1,2), 25.)]
columns = ["id", "date", "amount"]
df = pd.DataFrame(values, columns=columns)
df_single = df.set_index("date")
df_multi = df.set_index(["id", "date"])
Here is the output:
print(df_multi.index)
# MultiIndex(levels=[['a'], [2015-01-01 00:00:00, 2015-01-02 00:00:00]],
# labels=[[0, 0], [0, 1]],
# names=['id', 'date'])
print(df_single.index)
# Index([2015-01-01, 2015-01-02], dtype='object', name='date')
For information, I'm using the following versions:
Upvotes: 2
Views: 1412
Reputation: 49812
Let's start with your second question:
How can I define a multilevel index keeping the date type?
Workaround:
It is possible to replace part of an index. So in your example, after applying the multi index, the datetime
can be replaced with a date
like:
df_multi.index.set_levels([df['date'].values], level=[1], inplace=True)
Workaround Result:
>>> print(df_multi.index)
MultiIndex(levels=[[u'a'], [2015-01-01, 2015-01-02]],
labels=[[0, 0], [0, 1]],
names=[u'id', u'date'])
Why?
To your first question:
Is this a normal behavior?
Well this is normal, in that the code definitely does this. This behavior is a side effect of pandas.core.categorical.Categorical()
which ends up promoting the date
to a datetime64
via:
values = _possibly_infer_to_datetimelike(values, convert_dates=True)
I do not know if the effect you are seeing is by design or not, but you could open an issue here.
Upvotes: 1