Reputation: 249133
First I create a two-level MultiIndex:
import numpy as np
import pandas as pd
ind = pd.MultiIndex.from_product([('X','Y'), ('a','b')])
I can use it like this:
pd.DataFrame(np.zeros((3,4)), columns=ind)
Which gives:
X Y
a b a b
0 0.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0
But now I'm trying to do this:
dtype = [('Xa','f8'), ('Xb','i4'), ('Ya','f8'), ('Yb','i4')]
pd.DataFrame(np.zeros(3, dtype), columns=ind)
But that gives:
Empty DataFrame
Columns: [(X, a), (X, b), (Y, a), (Y, b)]
Index: []
I expected something like the previous result, with three rows.
Perhaps more generally, what I want to do is to generate a Pandas DataFrame with MultiIndex columns where the columns have distinct types (as in the example, a
is float but b
is int).
Upvotes: 2
Views: 497
Reputation: 294238
pd.DataFrame(np.zeros(3, dtype), columns=ind)
Empty DataFrame
Columns: [(X, a), (X, b), (Y, a), (Y, b)]
Index: []
is just showing the textual representation of the dataframe output.
Columns: [(X, a), (X, b), (Y, a), (Y, b)]
is then just the text representation of the index.
if you instead:
df = pd.DataFrame(np.zeros(3, dtype), columns=ind)
print type(df.columns)
<class 'pandas.indexes.multi.MultiIndex'>
You see it is indeed a pd.MultiIndex
That said and out of the way. What I don't understand is why specifying the index in the dataframe constructor removes the values.
A work around is this.
df = pd.DataFrame(np.zeros(3, dtype))
df.columns = ind
print df
X Y
a b a b
0 0.0 0 0.0 0
1 0.0 0 0.0 0
2 0.0 0 0.0 0
Upvotes: 1
Reputation: 375445
This looks like a bug, and worth reporting as an issue github.
A workaround is to set the columns manually after construction:
In [11]: df1 = pd.DataFrame(np.zeros(3, dtype))
In [12]: df1.columns = ind
In [13]: df1
Out[13]:
X Y
a b a b
0 0.0 0 0.0 0
1 0.0 0 0.0 0
2 0.0 0 0.0 0
Upvotes: 2