Ezekiel Kruglick
Ezekiel Kruglick

Reputation: 4686

How can I keep this Pandas column type as datetime when using MultiIndex?

This is so strange I honestly wonder if I am just mis-interpreting things.

I have some code that builds a multi-index from datetime values, but after generating the multi-index, the type has changed to Timestamp. I would like it to stay datetime. Why would I care? Well, another part of the code generates a multi-index in a slightly different way from the same datetimes but in that case it keeps the datettime type - so I can't use update between the two DataFrames because the columns are not seen as equal (yes I checked by comparing the values directly).

On to the minimal code example:

import datetime
import pandas as pd
import numpy as np

iterables = [['bar', 'baz', 'foo', 'qux'], [datetime.date(2014, 10, 3)]]
columns = pd.MultiIndex.from_product(iterables)
df = pd.DataFrame(np.random.randn(4, 4),columns = columns)

This gives:

>>> df.columns[0]
('bar', Timestamp('2014-10-03 00:00:00'))

Note that the type on the date is now Timestamp, not datetime. Down the other portion of my code it stays datettime (which is the correct chain of events, yes?) and so now they both reference the same date but test as not equal so I can't apply DataFrame.update to push data from one to the other. Unfortunately both paths have strong data-driven reasons why they should be done the way they are.

Any workarounds or solutions come to mind? Or did I miss something obvious and the whole premise is silly (which I won't discount as a possibility here somehow - I'm surprised the datetimes test as not equal to the Timestamps)? Is there an easy way to reach in and change the second level types of the multiindex to change those back into datettimes?

edit: Almost forgot: pandas version 0.15.2

Upvotes: 3

Views: 1324

Answers (1)

unutbu
unutbu

Reputation: 879143

In the construction of a MultiIndex, the values get changed into Categoricals.

Categoricals of datetimes automatically get converted to Timestamps:

In [287]: pd.Categorical([datetime.date(2014, 10, 3)])[0]
Out[287]: Timestamp('2014-10-03 00:00:00')

but the Categorical of an Index does not:

In [288]: pd.Categorical(pd.Index([datetime.date(2014, 10, 3)]))[0]
Out[288]: datetime.date(2014, 10, 3)

Therefore,

import datetime
import pandas as pd

iterables = [['bar', 'baz', 'foo', 'qux'], pd.Index([datetime.date(2014, 10, 3)])]
columns = pd.MultiIndex.from_product(iterables)
df = pd.DataFrame(np.random.randn(4, 4),columns=columns)
print(df.columns[0])

yields

('bar', datetime.date(2014, 10, 3))

Note, the easiest way I know to change the MultiIndex of a DataFrame, is to reassign a new MultiIndex. Thus, if df already has a MultiIndex with Timestamps, to convert the MultiIndex to datetimes use:

iterables = [['bar', 'baz', 'foo', 'qux'], pd.Index([datetime.date(2014, 10, 3)])]
columns = pd.MultiIndex.from_product(iterables)
df.columns = columns

Upvotes: 5

Related Questions