Reputation: 137
I have trouble with adding a single row to a MultiIndexed DataFrame in an efficient way. By adding the row, the MultiIndex is flattened to a simple Index of Tuples. Strangely this is not a problem for MultiIndexed columns.
System info:
Python 3.6.1 |Continuum Analytics, Inc.| (default, Mar 22 2017, 19:25:17)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.__version__
'0.19.2'
Example data: A DataFrame with both MultiIndex rows and columns
import numpy as np
import pandas as pd
index = pd.MultiIndex(levels=[['bar', 'foo'], ['one', 'two']],
labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
names=['row_0', 'row_1'])
columns = pd.MultiIndex(levels=[['dull', 'shiny'], ['a', 'b']],
labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
names=['col_0', 'col_1'])
df = pd.DataFrame(np.ones((4,4)),columns=columns, index=index)
print(df)
col_0 dull shiny
col_1 a b a b
row_0 row_1
bar one 1.0 1.0 1.0 1.0
two 1.0 1.0 1.0 1.0
foo one 1.0 1.0 1.0 1.0
two 1.0 1.0 1.0 1.0
It's no problem to add an additional column to the DataFrame:
df['last_col'] = 42 #define a new column and assign a value
print(df)
col_0 dull shiny last_col
col_1 a b a b
row_0 row_1
bar one 1.0 1.0 1.0 1.0 42
two 1.0 1.0 1.0 1.0 42
foo one 1.0 1.0 1.0 1.0 42
two 1.0 1.0 1.0 1.0 42
However, if i do the same for adding a row (by using loc), the MultiIndex is flattened to a simple Index of Tuples:
df.loc['last_row'] = 43 #define a new row and assign a value
print(df)
col_0 dull shiny last_col
col_1 a b a b
(bar, one) 1.0 1.0 1.0 1.0 42
(bar, two) 1.0 1.0 1.0 1.0 42
(foo, one) 1.0 1.0 1.0 1.0 42
(foo, two) 1.0 1.0 1.0 1.0 42
last_row 43.0 43.0 43.0 43.0 43
Does anyone has an idea how to add a row without flattening the index in a both simple and efficient way? Thank you very much!!
Upvotes: 3
Views: 478
Reputation: 862801
I think you need tuple with define both values of MultiIndex
:
df.loc[('last_row', 'a'), :] = 43
print(df)
col_0 dull shiny
col_1 a b a b
row_0 row_1
bar one 1.0 1.0 1.0 1.0
two 1.0 1.0 1.0 1.0
foo one 1.0 1.0 1.0 1.0
two 1.0 1.0 1.0 1.0
last_row a 43.0 43.0 43.0 43.0
For column it works similar:
df[('last_col', 'a')] = 43
print(df)
col_0 dull shiny last_col
col_1 a b a b a
row_0 row_1
bar one 1.0 1.0 1.0 1.0 43
two 1.0 1.0 1.0 1.0 43
foo one 1.0 1.0 1.0 1.0 43
two 1.0 1.0 1.0 1.0 43
EDIT:
It seems you need define columns names, if need all use :
:
df.loc['last_row',:] = 43
print(df)
col_0 dull shiny
col_1 a b a b
row_0 row_1
bar one 1.0 1.0 1.0 1.0
two 1.0 1.0 1.0 1.0
foo one 1.0 1.0 1.0 1.0
two 1.0 1.0 1.0 1.0
last_row 43.0 43.0 43.0 43.0
If level is not defined is added empty string:
print(df.index)
MultiIndex(levels=[['bar', 'foo', 'last_row'], ['one', 'two', '']],
labels=[[0, 0, 1, 1, 2], [0, 1, 0, 1, 2]],
names=['row_0', 'row_1'])
df.loc['last_row','dull'] = 43
print(df)
col_0 dull shiny
col_1 a b a b
row_0 row_1
bar one 1.0 1.0 1.0 1.0
two 1.0 1.0 1.0 1.0
foo one 1.0 1.0 1.0 1.0
two 1.0 1.0 1.0 1.0
last_row 43.0 43.0 NaN NaN
df.loc['last_row', ('dull', 'a')] = 43
print(df)
col_0 dull shiny
col_1 a b a b
row_0 row_1
bar one 1.0 1.0 1.0 1.0
two 1.0 1.0 1.0 1.0
foo one 1.0 1.0 1.0 1.0
two 1.0 1.0 1.0 1.0
last_row 43.0 NaN NaN NaN
Upvotes: 4