pythonpandasdataframeindexingmulti-index

Reputation: 1439

Pandas set_levels on MultiIndex: Level values must be unique

Given a DataFrame df

                    Value
Category Pool Class      
A        1.0  1.0       1
              9.0       2
B        1.0  1.0       3
C        1.0  1.0       4
              5.0       5

I want to convert the levels Pool and Class to integers without reset_index (see below).

I tried using a combination of get_level_values and set_levels like so

for c in ['Pool', 'Class']:
    df.index.set_levels(df.index.get_level_values(c).astype(int), level=c, inplace=True)

However, this raises

ValueError: Level values must be unique: [1, 1, 1, 1, 1] on level 1

To understand what happens, I also tried using verify_integrity=False. Then

df.index.set_levels(df.index.get_level_values('Class').astype(int),
                    level='Class', verify_integrity=False, inplace=True)

produces

                    Value
Category Pool Class      
A        1.0  1         1
              1         2
B        1.0  1         3
C        1.0  1         4
              9         5

whereas my goal is to obtain

                    Value
Category Pool Class      
A        1.0  1         1
              9         2
B        1.0  1         3
C        1.0  1         4
              5         5

How to achieve this properly? Is chaining of get_level_values and set_levels the correct way to do it? Why is pandas not able to properly set the level after having it transformed with astype?

I guess you could work with reset_index and set_index but what is the benefit then of having the methods set_levels?

d = {'Category': str, 'Pool': int, 'Class': int}
df.reset_index(drop=False, inplace=True)
for k, v in d.items():
    df[k] = df[k].astype(v)

df.set_index(list(d.keys()), inplace=True)

Upvotes: 10

Answers (4)

elaz

Reputation: 3

I found all the answers either did not work or were too confusing for me to implement. I did find a solution was to bypass set_index. Rather completely form the tuple of df.columns external, call it x. Then df.columns=x will replace the column independent of uniqueness in any element of the tuple not being unique. I have not tested what happens if two of the tuples are identical. A code snippet is shown below: A snippet is not shown because I cannot figure out how to format it. I think the idea is that, while levels may have non-unique elements, the tuples are unique - just guessing here; but it does work. Now add the shippet:

dfT50=dfT.rolling(window=50,min_periods = 1).mean() # I need dfT50.columns != dfT.columns for a concat
indx0=dfT.columns.get_level_values(level=0) #dfT was dfT=yf.download(...)
indx1=dfT.columns.get_level_values(level=1) # indx0 is not unique, indx1 is not unique
indx0=list(indx0)
newnames=list(indx1)
newnames=[x+'_SMA50' for x in newnames]
x=[]
for i in range(len(indx0)): x.append((indx0[i],newnames[i])) # form list of tuples (like dfT.columns)
dfT50.columns=x # resets dfT50.columns; maybe because x has unique elements

Upvotes: -2

Attila the Fun

Reputation: 417

To get the integer position that corresponds to a level name stored in variable k, you can use:

df.index.names.index(k)

So if, like OP, you have a dict of level names and types, simply do:

levels = [df.index.levels[df.index.names.index(k)].astype(v)
          for k, v in d.items()]
df.index = df.index.set_levels(levels=levels, level=d.keys())

Or, the same thing in a method chain:

df.set_index(
    df.index.set_levels(
        [df.index.levels[df.index.names.index(k)].astype(v)
         for k, v in d.items()],
        level=d.keys())
)...

Setup for OP's DataFrame and dict:

import pandas as pd

df = pd.DataFrame(
    range(1, 6),
    index=pd.MultiIndex.from_tuples(
        [
            ('A', 1., 1.),
            ('A', 1., 9.),
            ('B', 1., 1.),
            ('C', 1., 1.),
            ('C', 1., 5.)
        ],
        names=['Category', 'Pool', 'Class']
    ),
    columns=['Value']
)
d = {'Category': str, 'Pool': int, 'Class': int}

Upvotes: 0

Eran

Reputation: 844

The following function can be used as a complement to get_level_values:

def set_level_values(midx, level, values):
    full_levels = list(zip(*midx.values))
    names = midx.names
    if isinstance(level, str):
        if level not in names:
            raise ValueError(f'No level {level} in MultiIndex')
        level = names.index(level)
    if len(full_levels[level]) != len(values):
        raise ValueError('Values must be of the same size as original level')
    full_levels[level] = values
    return pd.MultiIndex.from_arrays(full_levels, names=names)

Using this function, the solution for the original question would be:

for c in ['Pool', 'Class']:
    df.index = set_level_values(df.index, level=c, values=df.index.get_level_values(c).astype(int))

Upvotes: 5

jpp

Reputation: 164743

You can access index levels directly via pd.MultiIndex.levels and feed to pd.MultiIndex.set_levels:

df.index = df.index.set_levels(df.index.levels[2].astype(int), level=2)

print(df)

                     Value
Category Pool Class       
A        1.0  1          1
              9          2
B        1.0  1          3
C        1.0  1          4
              5          5

Upvotes: 15

Pandas set_levels on MultiIndex: Level values must be unique

Answers (4)

Related Questions