David
David

Reputation: 427

Display summary statistics in barplot using ggplot/plotnine

In the following simplified example, I wish to display the sum of each stacked barplot (3 for A and 7 for B), yet my code displays all the values, not the summary statistics. What am I doing wrong? Thank you in advance.

import io
import pandas as pd
import plotnine as p9

data_string = """V1,V2,value
                 A,a,1
                 A,b,2
                 B,a,3 
                 B,b,4"""

data = io.StringIO(data_string)
df = pd.read_csv(data, sep=",")

p9.ggplot(df, p9.aes(x='V1', y='value', fill = 'V2')) + \
                p9.geom_bar(stat = 'sum') + \                
                p9.stat_summary(p9.aes(label ='stat(y)'), fun_y = sum, geom = "text")

enter image description here

Upvotes: 2

Views: 1062

Answers (1)

stefan
stefan

Reputation: 125373

The issue is the grouping of your data. As you have a global fill aesthetic your data gets grouped by categories of V2. Hence stat_summary computes the sum per group of V2. To solve this issue make fill a local aesthetic of geom_bar or geom_col.

import io
import pandas as pd
import plotnine as p9

data_string = """V1,V2,value
                 A,a,1
                 A,b,2
                 B,a,3 
                 B,b,4"""

data = io.StringIO(data_string)
df = pd.read_csv(data, sep=",")

p9.ggplot(df, p9.aes(x='V1', y='value')) + \
    p9.geom_col(p9.aes(fill = 'V2')) + \
    p9.stat_summary(p9.aes(label ='stat(y)'), fun_y = sum, geom = "text")

enter image description here

Another option would be to override the global grouping by setting group=1 in stat_summary:

p9.stat_summary(p9.aes(label ='stat(y)', group = 1), fun_y = sum, geom = "text")

Upvotes: 1

Related Questions