vferraz
vferraz

Reputation: 469

Stacked area chart (matplotlib) from Pandas pivot dable

I have the data in dis following format

import pandas as pd
import matplotlib.pyplot as plt

    Metric  Country  Year    Value
0       2G  Austria  2018  1049522
1       2G  Austria  2019   740746
2       2G  Austria  2020   508452
3       2G  Austria  2021   343667
4       2G  Austria  2022   234456
65      3G  Austria  2018  2133823
66      3G  Austria  2019  1406927
67      3G  Austria  2020  1164042
68      3G  Austria  2021  1043169
69      3G  Austria  2022   920025
130     4G  Austria  2018  7482733
131     4G  Austria  2019  8551865
132     4G  Austria  2020  8982975
133     4G  Austria  2021  9090997
134     4G  Austria  2022  8905121
195     5G  Austria  2018        0
196     5G  Austria  2019        0
197     5G  Austria  2020    41995
198     5G  Austria  2021   188848
199     5G  Austria  2022   553826

I am trying to create an "Area" chart based on the values per year, split by the metrics.

For that, I create a pivot table for agregating the results, as follows:

pivot_austria = pd.pivot_table(data_austria, index=['Metric'],
                               columns=['Year'],
                               values=['Value'], 
                               aggfunc=sum, 
                               fill_value=0)

Which returns the data in this format:

          Value                                    
Year       2018     2019     2020     2021     2022
Metric                                             
2G      1049522   740746   508452   343667   234456
3G      2133823  1406927  1164042  1043169   920025
4G      7482733  8551865  8982975  9090997  8905121
5G            0        0    41995   188848   553826

But when I try the plot command:

plot = plt.stackplot(pivot_austria.columns, pivot_austria.values, labels = pivot_austria.index)

I get an error

    return np.array(data, dtype=np.unicode)

ValueError: setting an array element with a sequence

I tried many things of plotting this, with and without pivot, and it didnt work so far, anyone know what I could be doing wrong?

Upvotes: 2

Views: 1018

Answers (2)

Scott Boston
Scott Boston

Reputation: 153560

I am not sure which kind of plot you trying to generate, but removing the backets around the value will help.

Let's try this first:

pivot_austria = pd.pivot_table(data_austria, index=['Metric'],
                               columns=['Year'],
                               values='Value', 
                               aggfunc=sum, 
                               fill_value=0)

plt.stackplot(pivot_austria.columns, pivot_austria.values, labels = pivot_austria.index)
ax = plt.gca()
ax.set_xticks(pivot_austria.columns)

Output:

enter image description here

Or as @pask suggest in his solution let pandas handle it:

ax = pivot_austria.plot.area()
ax.set_xticks(pivot_austria.index)

Output:

enter image description here

EDIT to display as percentages:

ax = (pivot_austria / pivot_austria.sum(1).max()).plot.area()
ax.set_xticks(pivot_austria.index)
ax.set_yticklabels(['{:,.2%}'.format(x) for x in ax.get_yticks()])
ax.set_ylim(0,1)

Output:

enter image description here

Upvotes: 5

pask
pask

Reputation: 927

Pandas already includes an easy way to plot area plots

Try:

pivot_austria.T.plot.area(xticks=pivot_austria.T.index)

Upvotes: 2

Related Questions