user3556757
user3556757

Reputation: 3609

manipulating multiindex columns in Pandas

I have a simple multiindex dataframe in pandas. I'm trying to add additional subcolumns, but I'm being warned off with

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

I cannot manage to get the right indexing incantation to make this work.

Attaching a code fragment that has a simple, non-hierarchical example of the sorts of columns I want to add. Then i have a hierchical example where I demonstrate how i can add new top-level colummns, but cannot properly manipulate individual sub-columns

import pandas as pd
import numpy as np


#simple example that works: add two columns to a non-hierarchical frame
sdf = pd.DataFrame(np.random.randn(6,4),columns=list('ABCD'))
sdf['E'] = 7
sdf['F'] = sdf['A'].diff(-1)


#hierarchical example
df = pd.DataFrame({('co1', 'price'): {0: 1, 1: 2, 2:12, 3: 14, 4: 15},\
('co1', 'size'): {0: 1, 1: 5, 2: 9, 3: 13, 4: 17},\
('co2', 'price'): {0: 2, 1: 6, 2: 10, 3: 14, 4: 18},\
('co2', 'size'): {0: 3, 1: 7, 2: 11, 3: 15, 4: 19}})

df.index.names = ['run']
df.columns.names = ['security', 'characteristic']

#I can add a new top level column
df['newTopLevel?'] = "yes"


#I cannot manipulate values of existing sub-level columns
"""A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead"""

df['co1']['size'] = "gross"
df['co1']['price'] = df['co1']['price']*2


#I cannot add a new sub-level column
df['co1']['new_sub_col'] = "fails"

I seem to be missing some fundamental understanding of this issue, which is frustrating as I've read pretty closely the O'Reilly "Python for Data Analysis" book written by the pandas author! ugh.

Upvotes: 2

Views: 848

Answers (1)

Andy Hayden
Andy Hayden

Reputation: 375485

To avoid the warning/error use loc and do these in one assignment:

In [11]: df.loc[:, ('co1', 'size')] = "gross"

In [12]: df.loc[:, ('co1', 'price')] *= 2

In [13]: df.loc[:, ('co1', 'new_sub_col')] = "fails"  # not anymore

In [14]: df
Out[14]:
security         co1          co2      newTopLevel?         co1
characteristic price   size price size              new_sub_col
run
0                  2  gross     2    3          yes       fails
1                  4  gross     6    7          yes       fails
2                 24  gross    10   11          yes       fails
3                 28  gross    14   15          yes       fails
4                 30  gross    18   19          yes       fails

Upvotes: 1

Related Questions