JDDS
JDDS

Reputation: 79

Adding counts to Plotly boxplots

I have a relatively simple issue, but cannot find any answer online that addresses it. Starting from a simple boxplot:

import plotly.express as px
 
df = px.data.iris()

fig = px.box(
    df, x='species', y='sepal_length'
)

val_counts = df['species'].value_counts()

I would now like to add val_counts (in this dataset, 50 for each species) to the plots, preferably on either of the following places:

How can I achieve this?

Upvotes: 2

Views: 4180

Answers (2)

vestland
vestland

Reputation: 61114

The snippet below will set count = 50 for all unique values of df['species'] on top of the max line using fig.add_annotation like this:

for s in df.species.unique():
    fig.add_annotation(x=s,
                       y = df[df['species']==s]['sepal_length'].max(),
                       text = str(len(df[df['species']==s]['species'])),
                       yshift = 10,
                       showarrow = False
                      )

Plot:

enter image description here

Complete code:

import plotly.express as px
 
df = px.data.iris()

fig = px.box(
    df, x='species', y='sepal_length'
)

for s in df.species.unique():
    fig.add_annotation(x=s,
                       y = df[df['species']==s]['sepal_length'].max(),
                       text = str(len(df[df['species']==s]['species'])),
                       yshift = 10,
                       showarrow = False
                      )
f = fig.full_figure_for_development(warn=False)
fig.show()

Upvotes: 3

Rob Raymond
Rob Raymond

Reputation: 31146

Using same approach that I presented in this answer: Change Plotly Boxplot Hover Data

  • calculate all the measures a box plot calculates plus the additional measure you want count
  • overlay bar traces over box plot traces so hover has all measures required
import plotly.express as px

df = px.data.iris()

# summarize data as per same dimensions as boxplot
df2 = df.groupby("species").agg(
    **{
        m
        if isinstance(m, str)
        else m[0]: ("sepal_length", m if isinstance(m, str) else m[1])
        for m in [
            "max",
            ("q75", lambda s: s.quantile(0.75)),
            "median",
            ("q25", lambda s: s.quantile(0.25)),
            "min",
            "count",
        ]
    }
).reset_index().assign(y=lambda d: d["max"] - d["min"])

# overlay bar over boxplot
px.bar(
    df2,
    x="species",
    y="y",
    base="min",
    hover_data={c:not c in ["y","species"] for c in df2.columns},
    hover_name="species",
).update_traces(opacity=0.1).add_traces(px.box(df, x="species", y="sepal_length").data)

enter image description here

Upvotes: 2

Related Questions