Reputation: 173
I have a Dataframe that looks something like:
2000 2001 2002 2003
Area Item Unit Code
A X j 00 val val val val
A Y k 01 val val val val
A Z k 10 val val val val
B X j 00 val val val val
B Y k 01 val val val val
B Z k 10 val val val val
I want to sum the areas to produce a DataFrame that looks like:
2000 2001 2002 2003
Area Item Unit Code
C X j 00 val val val val
C Y k 01 val val val val
C Z k 10 val val val val
If I use df = df.sum(level = "Item")
, the rest of the index gets dropped and I get (which I expected):
2000 2001 2002 2003
X val val val val
Y val val val val
Z val val val val
I can do df = pd.concat([df], keys = ["C"], names = ["Area"])
, to get solve the "Area" aspect but that doesn't help with the rest of the index.
I can't find a way to sum the area data whilst preserving the structure of the DataFrame.
My actual dataframe is larger than the example I've given (Area
values go A B C ... ZY ZZ
etc).
Upvotes: 3
Views: 175
Reputation: 862771
First use sum
by all levels without first Area
, then create new column filled by C
, add to index by DataFrame.set_index
with append=True
and last DataFrame.reorder_levels
for first position of this column:
print (df)
2000 2001 2002 2003
Area Item Unit Code
A X j 0 10 10 10 10
Y k 1 10 10 10 10
Z k 10 10 10 10 10
B X j 0 10 10 10 10
Y k 1 10 10 10 10
Z k 10 10 10 10 10
df1 = (df.sum(level = ["Item","Unit","Code"])
.assign(Area = 'C')
.set_index('Area', append=True)
.reorder_levels([3,0,1,2]))
print (df1)
2000 2001 2002 2003
Area Item Unit Code
C X j 0 20 20 20 20
Y k 1 20 20 20 20
Z k 10 20 20 20 20
Another solution:
df1 = (df.sum(level = ["Item","Unit","Code"])
.assign(Area = 'C')
.reset_index()
.set_index(["Area", "Item","Unit","Code"]))
print (df1)
2000 2001 2002 2003
Area Item Unit Code
C X j 0 20 20 20 20
Y k 1 20 20 20 20
Z k 10 20 20 20 20
Upvotes: 1