Reputation: 1568
There is an odd behavior when I try to get the index of a dataframe.
Let us consider following dataframe:
IN:
data['commodity']
OUT:
price max maxperstep
Site Commodity Type
Mid Biomass Stock 6.0 inf inf
CO2 Env 0.0 inf inf
Coal Stock 7.0 inf inf
Elec Demand NaN NaN NaN
Gas Stock 27.0 inf inf
Hydro SupIm NaN NaN NaN
Lignite Stock 4.0 inf inf
Slack Stock 999.0 inf inf
Solar SupIm NaN NaN NaN
Wind SupIm NaN NaN NaN
North Biomass Stock 6.0 inf inf
CO2 Env 0.0 inf inf
Coal Stock 7.0 inf inf
Elec Demand NaN NaN NaN
Gas Stock 27.0 inf inf
Hydro SupIm NaN NaN NaN
Lignite Stock 4.0 inf inf
Slack Stock 999.0 inf inf
Solar SupIm NaN NaN NaN
Wind SupIm NaN NaN NaN
South Biomass Stock 6.0 inf inf
CO2 Env 0.0 inf inf
Coal Stock 7.0 inf inf
Elec Demand NaN NaN NaN
Gas Stock 27.0 inf inf
Hydro SupIm NaN NaN NaN
Lignite Stock 4.0 inf inf
Slack Stock 999.0 inf inf
Solar SupIm NaN NaN NaN
Wind SupIm NaN NaN NaN
Select the index='Mid'
IN:
m.commodity = data['commodity'].loc[['Mid']]
OUT:
price max maxperstep
Site Commodity Type
Mid Biomass Stock 6.0 inf inf
CO2 Env 0.0 inf inf
Coal Stock 7.0 inf inf
Elec Demand NaN NaN NaN
Gas Stock 27.0 inf inf
Hydro SupIm NaN NaN NaN
Lignite Stock 4.0 inf inf
Slack Stock 999.0 inf inf
Solar SupIm NaN NaN NaN
Wind SupIm NaN NaN NaN
So far everything works perfectly. But there is an odd behavior, when I try to get the index of m.commodity, which is a part of the data['commodity'], I am assuming to get an index level = ['Mid']. Not like the following (South and North are not wanted):
IN:
m.commodity.index
OUT:
MultiIndex(levels=[['Mid', 'North', 'South'], ['Biomass', 'CO2', 'Coal', 'Elec', 'Gas', 'Hydro', 'Lignite', 'Slack', 'Solar', 'Wind'], ['Demand', 'Env', 'Stock', 'SupIm']],
labels=[[1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [2, 1, 2, 0, 2, 3, 2, 2, 3, 3]],
names=['Site', 'Commodity', 'Type'])
Is there a way to fix this? or what am I doing wrong?
Expected OUTCOME:
IN:
m.commodity.index
OUT:
MultiIndex(levels=[['Mid'], ['Biomass', 'CO2', 'Coal', 'Elec', 'Gas', 'Hydro',
'Lignite', 'Slack', 'Solar', 'Wind'], ['Demand', 'Env', 'Stock', 'SupIm']],
labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [2, 1, 2, 0, 2, 3, 2, 2, 3, 3]],
names=['Site', 'Commodity', 'Type'])
Upvotes: 1
Views: 128
Reputation: 862471
Edited by Icedkk:
Turns out, this behavior of pandas is common and is a bug mentioned in https://github.com/pandas-dev/pandas/issues/7614 , https://github.com/pandas-dev/pandas/issues/2770 , and https://github.com/pandas-dev/pandas/issues/12822
.remove_unused_levels() is somehow only way to fix this unusual behavior.
You need MultiIndex.remove_unused_levels
and assign back:
m.commodity.index = m.commodity.index.remove_unused_levels()
print (m.commodity.index)
MultiIndex(levels=[['Mid'], ['Biomass', 'CO2', 'Coal', 'Elec', 'Gas', 'Hydro',
'Lignite', 'Slack', 'Solar', 'Wind'], ['Demand', 'Env', 'Stock', 'SupIm']],
labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [2, 1, 2, 0, 2, 3, 2, 2, 3, 3]],
names=['Site', 'Commodity', 'Type'])
And for check first level:
print (m.commodity.index.remove_unused_levels().levels[0])
Index(['Mid'], dtype='object', name='Site')
Upvotes: 1