Reputation: 227
I need to plot a percentage graphic with a pivot table in pandas do you have any ideas? how I can calculate the percentage? thanks for your orientation!
This the code that I have :
df.pivot_table(index='obito', values=['asma', 'cardiopatia','diabetes','doenca_renal','obesidade']).T.plot(kind ='bar' , stacked = True)
This part of the dataframe that I have:
{'nome_munic': {66: 'Ferraz de Vasconcelos',
97: 'São Paulo',
100: 'São José dos Campos',
207: 'Mauá',
249: 'Cajamar',
258: 'Votuporanga',
285: 'Ferraz de Vasconcelos',
290: 'São Paulo',
345: 'São Pedro',
378: 'São Paulo'},
'codigo_ibge': {66: 3515707,
97: 3550308,
100: 3549904,
207: 3529401,
249: 3509205,
258: 3557105,
285: 3515707,
290: 3550308,
345: 3550407,
378: 3550308},
'idade': {66: 86,
97: 62,
100: 58,
207: 54,
249: 62,
258: 37,
285: 54,
290: 71,
345: 79,
378: 61},
'sexo': {66: 0,
97: 0,
100: 0,
207: 1,
249: 0,
258: 1,
285: 0,
290: 0,
345: 0,
378: 0},
'obito': {66: 1,
97: 0,
100: 0,
207: 1,
249: 1,
258: 1,
285: 0,
290: 1,
345: 1,
378: 0},
'asma': {66: 0,
97: 0,
100: 0,
207: 1,
249: 0,
258: 0,
285: 0,
290: 0,
345: 0,
378: 0},
'cardiopatia': {66: 1,
97: 0,
100: 1,
207: 1,
249: 1,
258: 0,
285: 1,
290: 1,
345: 0,
378: 0},
'diabetes': {66: 1,
97: 1,
100: 0,
207: 0,
249: 1,
258: 1,
285: 0,
290: 0,
345: 1,
378: 0},
'doenca_hematologica': {66: 0,
97: 0,
100: 0,
207: 0,
249: 0,
258: 0,
285: 0,
290: 0,
345: 0,
378: 0},
'doenca_hepatica': {66: 0,
97: 0,
100: 0,
207: 0,
249: 0,
258: 0,
285: 0,
290: 0,
345: 0,
378: 0},
'doenca_neurologica': {66: 0,
97: 0,
100: 0,
207: 0,
249: 0,
258: 0,
285: 0,
290: 1,
345: 0,
378: 0},
'doenca_renal': {66: 0,
97: 0,
100: 0,
207: 0,
249: 0,
258: 0,
285: 0,
290: 0,
345: 0,
378: 0},
'imunodepressao': {66: 0,
97: 0,
100: 1,
207: 0,
249: 0,
258: 0,
285: 0,
290: 0,
345: 0,
378: 0},
'obesidade': {66: 0,
97: 0,
100: 0,
207: 0,
249: 1,
258: 1,
285: 0,
290: 0,
345: 0,
378: 0},
'outros_fatores_de_risco': {66: 0,
97: 0,
100: 1,
207: 0,
249: 0,
258: 0,
285: 0,
290: 0,
345: 0,
378: 1},
'pneumopatia': {66: 0,
97: 1,
100: 0,
207: 0,
249: 0,
258: 0,
285: 0,
290: 0,
345: 0,
378: 0},
'puerpera': {66: 0,
97: 0,
100: 0,
207: 0,
249: 0,
258: 0,
285: 0,
290: 0,
345: 0,
378: 0},
'sindrome_de_down': {66: 0,
97: 0,
100: 0,
207: 0,
249: 0,
258: 0,
285: 0,
290: 0,
345: 0,
378: 0}}
Upvotes: 0
Views: 170
Reputation: 11405
The default aggfunc is np.mean
however it doesn’t compute the average per column or something like that, it computes the average of cells. In short it’s for every case the number of 1
divided by the number of 1
+ the number of 0
. This makes sense sometimes, but not for your data.
>>> df[['asma', 'cardiopatia','diabetes','doenca_renal','obesidade', 'obito']]
asma cardiopatia diabetes doenca_renal obesidade obito
66 0 1 1 0 0 1
97 0 0 1 0 0 0
100 0 1 0 0 0 0
207 1 1 0 0 0 1
249 0 1 1 0 1 1
258 0 0 1 0 1 1
285 0 1 0 0 0 0
290 0 1 0 0 0 1
345 0 0 1 0 0 1
378 0 0 0 0 0 0
>>> df.pivot_table(index='obito', values=['asma', 'cardiopatia','diabetes','doenca_renal','obesidade'])
asma cardiopatia diabetes doenca_renal obesidade
obito
0 0.000000 0.500000 0.250000 0 0.000000
1 0.166667 0.666667 0.666667 0 0.333333
Instead you probably want to count the total of patients per cell, then divide by column totals:
>>> counts = df.pivot_table(index='obito', values=['asma', 'cardiopatia','diabetes','doenca_renal','obesidade'], aggfunc=np.sum)
>>> counts / counts.sum()
asma cardiopatia diabetes doenca_renal obesidade
obito
0 0.0 0.333333 0.2 NaN 0.0
1 1.0 0.666667 0.8 NaN 1.0
Note how every column now sums to 1. doenca_renal
is NaN
because it has no patients at all in the sample, so the percentage is not defined. If you only want to plot the percentages that have obito=1
, then you can do this:
>>> from matplotlib import ticker
>>> ax = (counts / counts.sum()).loc[1].plot.bar(rot=0)
>>> ax.yaxis.set_major_formatter(ticker.PercentFormatter(xmax=1))
If you want to annotate the percentages on the bar, see this other question
Upvotes: 1