Zhubarb
Zhubarb

Reputation: 11895

Pandas Series .loc() access error after appending

I have a multi-index pandas series as below. I want to add a new entry (new_series) to multi_df, calling it multi_df_appended. However I don't understand the change in behaviour between multi_df and multi_df_appended when I try to access a non-existing multi-index.

Below is the code that reproduces the problem. I want the penultimate line of code: multi_df_appended.loc['five', 'black', 'hard', 'square' ] to return an empty Series like it does with multi_df but instead I get the error given. What am I doing wrong here?

df = pd.DataFrame({'id' : range(1,9),
                    'code' : ['one', 'one', 'two', 'three',
                                'two', 'three', 'one', 'two'],
                    'colour': ['black', 'white','white','white',
                            'black', 'black', 'white', 'white'],
                    'texture': ['soft', 'soft', 'hard','soft','hard',
                                        'hard','hard','hard'],
                    'shape': ['round', 'triangular', 'triangular','triangular','square',
                                        'triangular','round','triangular']
                    },  columns= ['id','code','colour', 'texture', 'shape'])
multi_df = df.set_index(['code','colour','texture','shape']).sort_index()['id']

# try to access a non-existing multi-index combination:
multi_df.loc['five', 'black', 'hard', 'square' ]
Series([], dtype: int64) # returns an empty Series as desired/expected.

# append multi_df with a new row 
new_series = pd.Series([9], index = [('four', 'black', 'hard', 'round')] )  
multi_df_appended = multi_df.append(new_series)

# now try again to access a non-existing multi-index combination:
multi_df_appended.loc['five', 'black', 'hard', 'square' ]
error: 'MultiIndex lexsort depth 0, key was length 4' # now instead of the empty Series, I get an error!?

Upvotes: 1

Views: 280

Answers (1)

Zhubarb
Zhubarb

Reputation: 11895

As @Jeff answered, if I do .sortlevel(0) and then run .loc() for an unknown index, it does not give the "lexsort depth" error:

multi_df_appended_sorted = multi_df.append(new_series).sortlevel(0)
multi_df_appended_sorted.loc['five', 'black', 'hard', 'square' ]
Series([], dtype: int64)

Upvotes: 2

Related Questions