oakca
oakca

Reputation: 1568

Pandas Dataframe indexing error

There is an odd behavior when I try to get the index of a dataframe.

Let us consider following dataframe:

IN:

data['commodity']

OUT:

                            price  max  maxperstep
    Site  Commodity Type
    Mid   Biomass   Stock     6.0  inf         inf
          CO2       Env       0.0  inf         inf
          Coal      Stock     7.0  inf         inf
          Elec      Demand    NaN  NaN         NaN
          Gas       Stock    27.0  inf         inf
          Hydro     SupIm     NaN  NaN         NaN
          Lignite   Stock     4.0  inf         inf
          Slack     Stock   999.0  inf         inf
          Solar     SupIm     NaN  NaN         NaN
          Wind      SupIm     NaN  NaN         NaN
    North Biomass   Stock     6.0  inf         inf
          CO2       Env       0.0  inf         inf
          Coal      Stock     7.0  inf         inf
          Elec      Demand    NaN  NaN         NaN
          Gas       Stock    27.0  inf         inf
          Hydro     SupIm     NaN  NaN         NaN
          Lignite   Stock     4.0  inf         inf
          Slack     Stock   999.0  inf         inf
          Solar     SupIm     NaN  NaN         NaN
          Wind      SupIm     NaN  NaN         NaN
    South Biomass   Stock     6.0  inf         inf
          CO2       Env       0.0  inf         inf
          Coal      Stock     7.0  inf         inf
          Elec      Demand    NaN  NaN         NaN
          Gas       Stock    27.0  inf         inf
          Hydro     SupIm     NaN  NaN         NaN
          Lignite   Stock     4.0  inf         inf
          Slack     Stock   999.0  inf         inf
          Solar     SupIm     NaN  NaN         NaN
          Wind      SupIm     NaN  NaN         NaN

Select the index='Mid'

IN:

m.commodity = data['commodity'].loc[['Mid']]

OUT:

                       price  max  maxperstep
Site Commodity Type
Mid  Biomass   Stock     6.0  inf         inf
     CO2       Env       0.0  inf         inf
     Coal      Stock     7.0  inf         inf
     Elec      Demand    NaN  NaN         NaN
     Gas       Stock    27.0  inf         inf
     Hydro     SupIm     NaN  NaN         NaN
     Lignite   Stock     4.0  inf         inf
     Slack     Stock   999.0  inf         inf
     Solar     SupIm     NaN  NaN         NaN
     Wind      SupIm     NaN  NaN         NaN

So far everything works perfectly. But there is an odd behavior, when I try to get the index of m.commodity, which is a part of the data['commodity'], I am assuming to get an index level = ['Mid']. Not like the following (South and North are not wanted):

IN:

m.commodity.index

OUT:

MultiIndex(levels=[['Mid', 'North', 'South'], ['Biomass', 'CO2', 'Coal', 'Elec', 'Gas', 'Hydro', 'Lignite', 'Slack', 'Solar', 'Wind'], ['Demand', 'Env', 'Stock', 'SupIm']],
           labels=[[1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [2, 1, 2, 0, 2, 3, 2, 2, 3, 3]],
           names=['Site', 'Commodity', 'Type'])

Is there a way to fix this? or what am I doing wrong?

Expected OUTCOME:

IN:

m.commodity.index

OUT:

MultiIndex(levels=[['Mid'], ['Biomass', 'CO2', 'Coal', 'Elec', 'Gas', 'Hydro', 
                  'Lignite', 'Slack', 'Solar', 'Wind'], ['Demand', 'Env', 'Stock', 'SupIm']],
           labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 
                   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [2, 1, 2, 0, 2, 3, 2, 2, 3, 3]],
           names=['Site', 'Commodity', 'Type'])

Upvotes: 1

Views: 128

Answers (1)

jezrael
jezrael

Reputation: 862471


Edited by Icedkk:

Turns out, this behavior of pandas is common and is a bug mentioned in https://github.com/pandas-dev/pandas/issues/7614 , https://github.com/pandas-dev/pandas/issues/2770 , and https://github.com/pandas-dev/pandas/issues/12822

.remove_unused_levels() is somehow only way to fix this unusual behavior.


You need MultiIndex.remove_unused_levels and assign back:

m.commodity.index = m.commodity.index.remove_unused_levels()
print (m.commodity.index)

MultiIndex(levels=[['Mid'], ['Biomass', 'CO2', 'Coal', 'Elec', 'Gas', 'Hydro', 
                  'Lignite', 'Slack', 'Solar', 'Wind'], ['Demand', 'Env', 'Stock', 'SupIm']],
           labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 
                   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [2, 1, 2, 0, 2, 3, 2, 2, 3, 3]],
           names=['Site', 'Commodity', 'Type'])

And for check first level:

print (m.commodity.index.remove_unused_levels().levels[0])

Index(['Mid'], dtype='object', name='Site')

Upvotes: 1

Related Questions