uniXVanXcel
uniXVanXcel

Reputation: 806

Adding a column to a Multiindex Dataframe

I would like to add a column SUM to the df1 below. It's a Datetime MultiIndex and the new column SUM should return the sum of the price row.

multex = pd.MultiIndex.from_product([['price',
 'weight','quantity','portfolio'] ,df1.index],names=['Date', 'Stats'])

new_df = pd.DataFrame(index=multex, columns= df1.columns.values.tolist())

Subsequently would like to add a row SUM as well that returns the same value. I've tried the following so far:

df1['SUM']= df1.ix['price'].sum(axis=1) 


                           A      B      C        D     E

2006-04-28 00:00:00                                    
            price        69.62  69.62  6.518   65.09  69.62
            weight
            std
2006-05-01 00:00:00  
            price         71.5   71.5  6.522   65.16   71.5
            weight
            std
2006-05-02 00:00:00   
            price        72.34  72.34  6.669   66.55  72.34
            weight
            std

Upvotes: 2

Views: 3838

Answers (1)

jezrael
jezrael

Reputation: 862481

You can use first DataFrame.sort_index, because error:

KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (1)'

Then use slicers:

df1 = df1.sort_index()

idx = pd.IndexSlice
df1['SUM'] = df1.loc[idx[:,'price'],:].sum(axis=1)
print (df1)
                                A      B      C      D      E      SUM
Date                Stats                                             
2006-04-28 00:00:00 price   69.62  69.62  6.518  65.09  69.62  280.468
                    std       NaN    NaN    NaN    NaN    NaN      NaN
                    weight    NaN    NaN    NaN    NaN    NaN      NaN
2006-05-01 00:00:00 price   71.50  71.50  6.522  65.16  71.50  286.182
                    std       NaN    NaN    NaN    NaN    NaN      NaN
                    weight    NaN    NaN    NaN    NaN    NaN      NaN
2006-05-02 00:00:00 price   72.34  72.34  6.669  66.55  72.34  290.239
                    std       NaN    NaN    NaN    NaN    NaN      NaN
                    weight    NaN    NaN    NaN    NaN    NaN      NaN

df1['SUM'] = df1.loc[(slice(None), slice('price')),:].sum(axis=1)
print (df1)
                                A      B      C      D      E      SUM
Date                Stats                                             
2006-04-28 00:00:00 price   69.62  69.62  6.518  65.09  69.62  280.468
                    std       NaN    NaN    NaN    NaN    NaN      NaN
                    weight    NaN    NaN    NaN    NaN    NaN      NaN
2006-05-01 00:00:00 price   71.50  71.50  6.522  65.16  71.50  286.182
                    std       NaN    NaN    NaN    NaN    NaN      NaN
                    weight    NaN    NaN    NaN    NaN    NaN      NaN
2006-05-02 00:00:00 price   72.34  72.34  6.669  66.55  72.34  290.239
                    std       NaN    NaN    NaN    NaN    NaN      NaN
                    weight    NaN    NaN    NaN    NaN    NaN      NaN

Upvotes: 3

Related Questions