Reputation:
Say I got this multiindex DataFrame
:
>>> df = pandas.DataFrame(index=range(3), columns=pandas.MultiIndex.from_product(
(('A', 'B'), ('C', 'D'), ('E', 'F'))))
>>> df
A B
C D C D
E F E F E F E F
0 NaN NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN NaN
>>> df.dtypes
A C E object
F object
D E object
F object
B C E object
F object
D E object
F object
How would I set the type of all columns E
to float64
and all columns F
to int64
? I.e., so that df.dtypes
returns the following:
A C E float64
F int64
D E float64
F int64
B C E float64
F int64
D E float64
F int64
I know about DataFrame.astype
and it works fine for singly indexed DataFrame
's but how would I use it with multiindexing? In the real code the number of columns are a lot higher: still three levels, but columns reaching couple of millions.
I've been searching the web and the documentation though I can't find the answer. It feels like I've misunderstood something about the DataFrame
concept and that I'm wrong in wanting what I want.
Thank you in advance!
Upvotes: 4
Views: 1389
Reputation: 402603
Integer columns of NaNs aren't supported on older versions, but starting from v0.24, you can use the nullable dtype. Select column slices using pd.IndexSlice
, then set the type like this:
pd.__version__
# '0.24.2'
for cval, dtype in [('E', 'float64'), ('F', 'Int64')]:
df.loc[:, pd.IndexSlice[:, :,cval]] = (
df.loc[:, pd.IndexSlice[:, :,cval]].astype(dtype))
df.dtypes
A C E float64
F Int64
D E float64
F Int64
B C E float64
F Int64
D E float64
F Int64
dtype: object
Note that the I
in Int64
is capitalized to represent a Nullable Integer Type.
Upvotes: 3