user9408921
user9408921

Reputation:

Setting the types (dtypes) of a multiindex DataFrame

Say I got this multiindex DataFrame:

>>> df = pandas.DataFrame(index=range(3), columns=pandas.MultiIndex.from_product(
        (('A', 'B'), ('C', 'D'), ('E', 'F'))))
>>> df
     A                   B                                                                             
     C         D         C         D                                                                   
     E    F    E    F    E    F    E    F                                                              
0  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN                                                              
1  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN                                                              
2  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
>>> df.dtypes                                                                                          
A  C  E    object                                                                                      
      F    object                                                                                      
   D  E    object                                                                                      
      F    object                                                                                      
B  C  E    object                                                                                      
      F    object                                                                                      
   D  E    object                                                                                      
      F    object 

How would I set the type of all columns E to float64 and all columns F to int64? I.e., so that df.dtypes returns the following:

A  C  E    float64                                                                                      
      F    int64                                                                                      
   D  E    float64                                                                                      
      F    int64                                                                                      
B  C  E    float64                                                                                      
      F    int64                                                                                      
   D  E    float64                                                                                      
      F    int64

I know about DataFrame.astype and it works fine for singly indexed DataFrame's but how would I use it with multiindexing? In the real code the number of columns are a lot higher: still three levels, but columns reaching couple of millions.

I've been searching the web and the documentation though I can't find the answer. It feels like I've misunderstood something about the DataFrame concept and that I'm wrong in wanting what I want.

Thank you in advance!

Upvotes: 4

Views: 1389

Answers (1)

cs95
cs95

Reputation: 402603

Integer columns of NaNs aren't supported on older versions, but starting from v0.24, you can use the nullable dtype. Select column slices using pd.IndexSlice, then set the type like this:

pd.__version__
# '0.24.2'

for cval, dtype in [('E', 'float64'), ('F', 'Int64')]:
    df.loc[:, pd.IndexSlice[:, :,cval]] = (
        df.loc[:, pd.IndexSlice[:, :,cval]].astype(dtype))

df.dtypes
A  C  E    float64
      F      Int64
   D  E    float64
      F      Int64
B  C  E    float64
      F      Int64
   D  E    float64
      F      Int64
dtype: object

Note that the I in Int64 is capitalized to represent a Nullable Integer Type.

Upvotes: 3

Related Questions