Newbielp
Newbielp

Reputation: 532

How to group data and also specify the percentiles in a go.box?

I am trying to achieve a plotly Figure like the following furthier down but instead of the whiskers to show the min and max, I want the percentiles 10th and 90th and I cannot figure a way to make it work.

There is some inspiration here and here which show correspondingly that there is a way to manipulate the boxplot and group the data, but I have not figured out how to it works.

I have a piece of code that I would like to share and would appreciate some help.

import plotly.graph_objects as go
from itertools import cycle

# generate a dataframe
num_rows = 100
ids = np.random.randint(1, 1000, size=num_rows)
categories = np.random.choice(['A', 'B', 'C', 'D'], size=num_rows)
phases, durations = [], []
for id in ids:
    phases.extend([1, 2, 3])
    durations.extend(np.random.randint(100, 1001, size=3))
data = {
    'id': np.repeat(ids, 3),
    'category': np.repeat(categories, 3),
    'phase': phases,
    'duration': durations}
df = pd.DataFrame(data)
df = df.sample(frac=1).reset_index(drop=True)

# calculate statistics
p10 = lambda x: x.quantile(0.10)
p25 = lambda x: x.quantile(0.25)
p50 = lambda x: x.quantile(0.50)
p75 = lambda x: x.quantile(0.75)
p90 = lambda x: x.quantile(0.90)

to_display = df.groupby(['phase', 'category'], as_index=False).agg(p_10 = ('duration', p10),
                                                      p_25 = ('duration', p25),
                                                      p_75 = ('duration', p75),
                                                      median = ('duration', p50),
                                                      p_90 = ('duration', p90),
                                                      avg = ('duration', 'mean')
                                                        )

# create plot
palette = cycle(['black', 'grey', 'red', 'blue'])
fig_grouped = go.Figure()
for i, cat in enumerate(df['category'].unique()):
    # print(i, cat)
    df_plot = df[df['category']==cat]
    fig_grouped.add_trace(go.Box(y = df_plot['duration'],
                          x = df_plot['phase'],
                          name = cat, boxpoints=False,
                          marker_color=next(palette)))
    fig_grouped.update_traces(boxmean=True)
    fig_grouped.update_layout(boxmode='group')
    
    temp = to_display[to_display['category']==cat]
    q1 = list(temp['p_25'].values)
    median = list(temp['median'].values)
    q3 = list(temp['p_75'].values)
    
    lowerfence = list(temp['p_10'].values)
    upperfence = list(temp['p_90'].values)
    
    avg = list(temp['avg'].values)
    
    print(cat)
    print('q1', q1)
    print('median', median)
    print('q3', q3)
    print('lowerfence', lowerfence)
    print('upperfence', upperfence)
    print('avg', avg)
    
    fig.update_traces(q1 = q1, 
                      median = median, 
                      q3 = q3, 
                      lowerfence = lowerfence,
                      upperfence = upperfence
                      )
fig_grouped.show()

The code above gives me this: plotly graph_objects boxplot

and it is very close to what I need except that the boxes should inform about the percentiles requested but it does not seem to work.

PS: the to_display dataframe has the following format

    phase category   p_10    p_25    p_75   p_90         avg
0       1        A  231.8  376.25  803.75  920.6  591.964286
1       1        B  124.0  200.50  669.50  886.4  468.913043
2       1        C  203.2  318.50  784.50  901.6  535.739130
3       1        D  175.0  294.75  821.00  882.5  528.961538
4       2        A  326.5  448.25  764.25  842.8  586.928571
5       2        B  169.0  321.50  825.50  933.6  599.304348
6       2        C  138.8  352.50  808.50  936.2  556.260870
7       2        D  166.5  376.50  783.50  872.0  590.961538
8       3        A  260.2  419.00  707.00  828.2  564.928571
9       3        B  190.0  528.00  836.50  962.0  614.043478
10      3        C  343.4  450.00  812.00  894.6  630.652174
11      3        D  204.5  364.00  833.25  929.0  604.384615

Upvotes: 3

Views: 151

Answers (0)

Related Questions