Unexpected behavior of .loc on multilevel indexed dataframes

Question

I encountered a behavior of .loc for dataframes with a mulitlevel index, that I can't explain.

The setup:

import pandas as pd
df = pd.DataFrame({'ID': [1, 2, 3, 4],
                   'DT': [2018, 2018, 2017, 2018],
                   'F1': [0, 1, 0, 0],
                   'F2': [0, 0, 1, 0]  })

df.loc[5]= [5, 2019, 1, 0]
df

Up to now everythings nice and looks like (note a line with index 5 has been inserted):

   ID    DT  F1  F2
0   1  2018   0   0
1   2  2018   1   0
2   3  2017   0   1
3   4  2018   0   0
5   5  2019   1   0

Now create a copy with a mulilevel index on 'ID' and 'DT' and use it with loc:

indexed= df.set_index(['ID', 'DT'], inplace=False)
indexed.loc[(2, 2018)]

This still works and outputs the values corresponding to the given index values:

F1    1
F2    0
Name: (2, 2018), dtype: int64

It also can be updated this way using:

indexed.loc[(2, 2018)]= [1, 4]

Now try to insert a new row the same way we could do it above on the single level index:

indexed.loc[(1, 2019)]= [3, 4]

This raises an exception:

ValueError: cannot set using a multi-index selection indexer with a different length than the value

And the dataframe was changed, as if the loc access interpreted 2019 to be the name of a column. So the dataframe now looks like:

         F1  F2  2019
ID DT                
1  2018   0   0   NaN
2  2018   1   0   NaN
3  2017   0   1   NaN
4  2018   0   0   NaN
5  2019   1   0   NaN

Can anybody explain this strange behavior, or is that a bug?

jezrael · Accepted Answer

Use : for get all columns for new or update, without : it is shorcut, unfortunately working only for update:

indexed.loc[(2, 2018), :]= [1, 4]
indexed.loc[(1, 2019), :]= [3, 4]
print (indexed)
          F1   F2
ID DT            
1  2018  0.0  0.0
2  2018  1.0  4.0
3  2017  0.0  1.0
4  2018  0.0  0.0
5  2019  1.0  0.0
1  2019  3.0  4.0

Unexpected behavior of .loc on multilevel indexed dataframes

Answers (1)

Related Questions