Reputation: 532
I am trying to achieve a plotly Figure like the following furthier down but instead of the whiskers to show the min and max, I want the percentiles 10th and 90th and I cannot figure a way to make it work.
There is some inspiration here and here which show correspondingly that there is a way to manipulate the boxplot and group the data, but I have not figured out how to it works.
I have a piece of code that I would like to share and would appreciate some help.
import plotly.graph_objects as go
from itertools import cycle
# generate a dataframe
num_rows = 100
ids = np.random.randint(1, 1000, size=num_rows)
categories = np.random.choice(['A', 'B', 'C', 'D'], size=num_rows)
phases, durations = [], []
for id in ids:
phases.extend([1, 2, 3])
durations.extend(np.random.randint(100, 1001, size=3))
data = {
'id': np.repeat(ids, 3),
'category': np.repeat(categories, 3),
'phase': phases,
'duration': durations}
df = pd.DataFrame(data)
df = df.sample(frac=1).reset_index(drop=True)
# calculate statistics
p10 = lambda x: x.quantile(0.10)
p25 = lambda x: x.quantile(0.25)
p50 = lambda x: x.quantile(0.50)
p75 = lambda x: x.quantile(0.75)
p90 = lambda x: x.quantile(0.90)
to_display = df.groupby(['phase', 'category'], as_index=False).agg(p_10 = ('duration', p10),
p_25 = ('duration', p25),
p_75 = ('duration', p75),
median = ('duration', p50),
p_90 = ('duration', p90),
avg = ('duration', 'mean')
)
# create plot
palette = cycle(['black', 'grey', 'red', 'blue'])
fig_grouped = go.Figure()
for i, cat in enumerate(df['category'].unique()):
# print(i, cat)
df_plot = df[df['category']==cat]
fig_grouped.add_trace(go.Box(y = df_plot['duration'],
x = df_plot['phase'],
name = cat, boxpoints=False,
marker_color=next(palette)))
fig_grouped.update_traces(boxmean=True)
fig_grouped.update_layout(boxmode='group')
temp = to_display[to_display['category']==cat]
q1 = list(temp['p_25'].values)
median = list(temp['median'].values)
q3 = list(temp['p_75'].values)
lowerfence = list(temp['p_10'].values)
upperfence = list(temp['p_90'].values)
avg = list(temp['avg'].values)
print(cat)
print('q1', q1)
print('median', median)
print('q3', q3)
print('lowerfence', lowerfence)
print('upperfence', upperfence)
print('avg', avg)
fig.update_traces(q1 = q1,
median = median,
q3 = q3,
lowerfence = lowerfence,
upperfence = upperfence
)
fig_grouped.show()
and it is very close to what I need except that the boxes should inform about the percentiles requested but it does not seem to work.
PS: the to_display
dataframe has the following format
phase category p_10 p_25 p_75 p_90 avg
0 1 A 231.8 376.25 803.75 920.6 591.964286
1 1 B 124.0 200.50 669.50 886.4 468.913043
2 1 C 203.2 318.50 784.50 901.6 535.739130
3 1 D 175.0 294.75 821.00 882.5 528.961538
4 2 A 326.5 448.25 764.25 842.8 586.928571
5 2 B 169.0 321.50 825.50 933.6 599.304348
6 2 C 138.8 352.50 808.50 936.2 556.260870
7 2 D 166.5 376.50 783.50 872.0 590.961538
8 3 A 260.2 419.00 707.00 828.2 564.928571
9 3 B 190.0 528.00 836.50 962.0 614.043478
10 3 C 343.4 450.00 812.00 894.6 630.652174
11 3 D 204.5 364.00 833.25 929.0 604.384615
Upvotes: 3
Views: 151