jester_thomas
jester_thomas

Reputation: 173

Is there a way to sum over an index level in a dataframe while preserving index levels that are repeated?

I have a Dataframe that looks something like:

                     2000 2001 2002 2003
Area Item Unit Code  
A    X    j    00    val  val  val  val
A    Y    k    01    val  val  val  val
A    Z    k    10    val  val  val  val
B    X    j    00    val  val  val  val
B    Y    k    01    val  val  val  val
B    Z    k    10    val  val  val  val

I want to sum the areas to produce a DataFrame that looks like:

                     2000 2001 2002 2003
Area Item Unit Code  
C    X    j    00    val  val  val  val
C    Y    k    01    val  val  val  val
C    Z    k    10    val  val  val  val

If I use df = df.sum(level = "Item"), the rest of the index gets dropped and I get (which I expected):

     2000 2001 2002 2003
X    val  val  val  val
Y    val  val  val  val
Z    val  val  val  val

I can do df = pd.concat([df], keys = ["C"], names = ["Area"]), to get solve the "Area" aspect but that doesn't help with the rest of the index.

I can't find a way to sum the area data whilst preserving the structure of the DataFrame.

My actual dataframe is larger than the example I've given (Area values go A B C ... ZY ZZ etc).

Upvotes: 3

Views: 175

Answers (1)

jezrael
jezrael

Reputation: 862771

First use sum by all levels without first Area, then create new column filled by C, add to index by DataFrame.set_index with append=True and last DataFrame.reorder_levels for first position of this column:

print (df)
                     2000  2001  2002  2003
Area Item Unit Code                        
A    X    j    0       10    10    10    10
     Y    k    1       10    10    10    10
     Z    k    10      10    10    10    10
B    X    j    0       10    10    10    10
     Y    k    1       10    10    10    10
     Z    k    10      10    10    10    10


df1 = (df.sum(level = ["Item","Unit","Code"])
         .assign(Area = 'C')
         .set_index('Area', append=True)
         .reorder_levels([3,0,1,2]))
print (df1)
                     2000  2001  2002  2003
Area Item Unit Code                        
C    X    j    0       20    20    20    20
     Y    k    1       20    20    20    20
     Z    k    10      20    20    20    20

Another solution:

df1 = (df.sum(level = ["Item","Unit","Code"])
         .assign(Area = 'C')
         .reset_index()
         .set_index(["Area", "Item","Unit","Code"]))
print (df1)
                     2000  2001  2002  2003
Area Item Unit Code                        
C    X    j    0       20    20    20    20
     Y    k    1       20    20    20    20
     Z    k    10      20    20    20    20

Upvotes: 1

Related Questions