Reputation: 4521
I encountered a behavior of .loc for dataframes with a mulitlevel index, that I can't explain.
The setup:
import pandas as pd
df = pd.DataFrame({'ID': [1, 2, 3, 4],
'DT': [2018, 2018, 2017, 2018],
'F1': [0, 1, 0, 0],
'F2': [0, 0, 1, 0] })
df.loc[5]= [5, 2019, 1, 0]
df
Up to now everythings nice and looks like (note a line with index 5 has been inserted):
ID DT F1 F2
0 1 2018 0 0
1 2 2018 1 0
2 3 2017 0 1
3 4 2018 0 0
5 5 2019 1 0
Now create a copy with a mulilevel index on 'ID' and 'DT' and use it with loc:
indexed= df.set_index(['ID', 'DT'], inplace=False)
indexed.loc[(2, 2018)]
This still works and outputs the values corresponding to the given index values:
F1 1
F2 0
Name: (2, 2018), dtype: int64
It also can be updated this way using:
indexed.loc[(2, 2018)]= [1, 4]
Now try to insert a new row the same way we could do it above on the single level index:
indexed.loc[(1, 2019)]= [3, 4]
This raises an exception:
ValueError: cannot set using a multi-index selection indexer with a different length than the value
And the dataframe was changed, as if the loc access interpreted 2019 to be the name of a column. So the dataframe now looks like:
F1 F2 2019
ID DT
1 2018 0 0 NaN
2 2018 1 0 NaN
3 2017 0 1 NaN
4 2018 0 0 NaN
5 2019 1 0 NaN
Can anybody explain this strange behavior, or is that a bug?
Upvotes: 3
Views: 212
Reputation: 862691
Use :
for get all columns for new or update, without :
it is shorcut, unfortunately working only for update:
indexed.loc[(2, 2018), :]= [1, 4]
indexed.loc[(1, 2019), :]= [3, 4]
print (indexed)
F1 F2
ID DT
1 2018 0.0 0.0
2 2018 1.0 4.0
3 2017 0.0 1.0
4 2018 0.0 0.0
5 2019 1.0 0.0
1 2019 3.0 4.0
Upvotes: 2