user308827
user308827

Reputation: 21981

pandas complicated stacked barplot

I have the following data:

    Year    LandUse     Region  Area
0   2005    Corn        LP  2078875
1   2005    Corn        UP  149102.4
2   2005    Open Lands      LP  271715
3   2005    Open Lands      UP  232290.1
4   2005    Soybeans        LP  1791342
5   2005    Soybeans        UP  50799.12
6   2005    Other Ag        LP  638010.4
7   2005    Other Ag        UP  125527.2
8   2005    Forests/Wetlands        LP  69629.86
9   2005    Forests/Wetlands        UP  26511.43
10  2005    Developed       LP  10225.56
11  2005    Developed       UP  1248.442
12  2010    Corn        LP  2303999
13  2010    Corn        UP  201977.2
14  2010    Open Lands      LP  131696.3
15  2010    Open Lands      UP  45845.81
16  2010    Soybeans        LP  1811186
17  2010    Soybeans        UP  66271.21
18  2010    Other Ag        LP  635332.9
19  2010    Other Ag        UP  257439.9
20  2010    Forests/Wetlands        LP  48124.43
21  2010    Forests/Wetlands        UP  23433.76
22  2010    Developed       LP  7619.853
23  2010    Developed       UP  707.4816

How do I use pandas to make a stacked bar plot that shows area on y-axis and uses 'REGION' to construct the stacks and uses YEAR and LandUse on x-axis.

Upvotes: 0

Views: 73

Answers (1)

Marius
Marius

Reputation: 60130

The main thing with pandas plots is figuring out which shape pandas expects the data to be in. If we reshape so that Year is in the index and different regions are in different columns:

# Assuming that we want to sum the areas for different
# LandUse's within each region
plot_table = df.pivot_table(index='Year', columns='Region', 
                            values='Area', aggfunc='sum')
plot_table

Out[39]: 
Region           LP           UP
Year                            
2005    4859797.820  585478.6920
2010    4937958.483  595675.3616

The plotting happens pretty straightforwardly:

plot_table.plot(kind='bar', stacked=True)

Having both Year and LandUse on the x-axis doesn't require much extra work, you can put both in the index when creating the table for plotting:

plot_table = df.pivot_table(index=['Year', 'LandUse'], 
                            columns='Region', 
                            values='Area', aggfunc='sum')

enter image description here

Upvotes: 1

Related Questions