KansaiRobot
KansaiRobot

Reputation: 9952

How to specify the percentiles in a pyplot box?

Say I have the simplest of scripts

import plotly.offline as pyo
import plotly.graph_objs as go

# set up an array of 20 data points, with 20 as the median value
y = [1,14,14,15,16,18,18,19,19,20,20,23,24,26,27,27,28,29,33,54]

data = [
    go.Box(
        y=y,
        boxpoints='outliers' # display only outlying data points
    )
]
pyo.plot(data, filename='box2.html')

With that I have the following

box plot

My question is, I understand this is for the 25% and 75% percentile? Is there a way to change the value of the percentiles shown?

Upvotes: 1

Views: 3048

Answers (2)

Rob Raymond
Rob Raymond

Reputation: 31196

  • as per other comment and answer. Quartiles are quartiles and can be calculated in other ways
  • if you wan t percentiles in addition to quartiles, you can add additional lines to the figure

below demonstrates adding 9 percentile lines to the figure

import plotly.graph_objs as go
import numpy as np
import pandas as pd
import plotly.express as px

# set up an array of 20 data points, with 20 as the median value
y = [1,14,14,15,16,18,18,19,19,20,20,23,24,26,27,27,28,29,33,54]

data = [
    go.Box(
        y=y,
        boxpoints='outliers' # display only outlying data points
    )
]

fig = go.Figure(data) #.add_traces(go.Bar(x=np.repeat(["trace 0"],9), y=[np.percentile(y, q) for q in np.linspace(10,90,9)]))

fig.add_traces(
    px.line(
        pd.DataFrame(
            {
                "y": np.repeat(np.percentile(y, np.linspace(10, 90, 9)), 2),
                "x": np.tile([0.4, 0.6], 9),
                "p": np.repeat(np.linspace(10, 90, 9), 2),
            }
        ),
        x="x",
        y="y",
        color="p",
    )
    .update_traces(showlegend=False, xaxis="x2")
    .data
).update_layout(xaxis2={"overlaying": "x", "visible": False, "range": [0, 1]})

enter image description here

Upvotes: 1

Derek O
Derek O

Reputation: 19600

You can pass your own precomputed quartiles and make Q1 and Q3 whatever values you like, and then calculate whatever percentile you want Q1 and Q3 to be.

In another question about plotly boxplots that I answered here, I wrote a function that computes the percentiles using the same method plotly uses.

You'll need to specify the lowerfence, (new) Q1, median, (new) Q3, and upperfence in the update_traces method. Here is what I get if I set Q1 = 5th percentile and Q3 = 95th percentile:

from math import floor, ceil
from tkinter.messagebox import YES
import plotly.offline as pyo
import plotly.graph_objs as go

## calculate quartiles as outlined in the plotly documentation 
def get_percentile(data, p):
    data.sort()
    n = len(data)
    x = n*p + 0.5
    x1, x2 = floor(x), ceil(x)
    y1, y2 = data[x1-1], data[x2-1] # account for zero-indexing
    print(x1, x2, y1, y2)
    return y1 + ((x - x1) / (x2 - x1))*(y2 - y1)

# set up an array of 20 data points, with 20 as the median value
y = [1,14,14,15,16,18,18,19,19,20,20,23,24,26,27,27,28,29,33,54]

fig = go.Figure()
fig.add_traces(go.Box(
    y=y,
    boxpoints='outliers' # display only outlying data points
))

q1, median, q3 = get_percentile(y, 0.05), get_percentile(y, 0.50), get_percentile(y, 0.95)

fig.update_traces(q1=[q1], median=[median],
                  q3=[q3], lowerfence=[min(y)],
                  upperfence=[max(y)], orientation='v')
fig.show()

enter image description here

Upvotes: 3

Related Questions