Sayyor Y
Sayyor Y

Reputation: 1314

Plotting averages of box plots as a box plot

I have a set of lists (about 100) of the form [6, 17, 5, 1, 4, 7, 14, 19, 0, 10] and I want to get one box plot which plots the averages of box-plot information (i.e. median, max, min, Q1, Q3, outliers) of all of the lists.

For example, if I have 2 lists

l1 = [6, 17, 5, 1, 4, 7, 14, 19, 0, 10]
l2 = [4, 12, 3, 5, 16, 0, 14, 7, 8, 15]

I can get averages of max, median, and min of the lists as follows:

maxs = np.array([])
mins = np.array([])
medians = np.array([])
for l in [l1, l2]:
    medians = np.append(medians, np.median(l))
    maxs = np.append(maxs, np.max(l))
    mins = np.append(mins, np.min(l))
averMax = np.mean(maxs)
averMin = np.mean(mins)
averMedian = np.mean(medians)

I should do the same for other info in the box plot such as average Q1, average Q3. I then need to use this information (averMax, averMin, etc.) to plot just one single box plot (not multiple box plots in one graph).

I know from Draw Box-Plot with matplotlib that you don't have to calculate the values for a normal box plot. You just need to specify the data as a variable. Is it possible to do the same for my case instead of manually calculating the averages of the values of all the lists?

Upvotes: 0

Views: 838

Answers (1)

r-beginners
r-beginners

Reputation: 35230

pd.describe() will get the quartiles, so you can make a graph based on them. I customized the calculated numbers with the help of this answer and the example graph from the official reference.

import pandas as pd
import numpy as np
import io

l1 = [6, 17, 5, 1, 4, 7, 14, 19, 0, 10]
l2 = [4, 12, 3, 5, 16, 0, 14, 7, 8, 15]

df = pd.DataFrame({'l1':l1, 'l2':l2}, index=np.arange(len(l1)))

df.describe()
l1  l2
count   10.000000   10.000000
mean    8.300000    8.400000
std 6.532823    5.561774
min 0.000000    0.000000
25% 4.250000    4.250000
50% 6.500000    7.500000
75% 13.000000   13.500000
max 19.000000   16.000000

import matplotlib.pyplot as plt

# spread,center, filer_high, flier_low
x1 = [l1[4]-1.5*(l1[6]-l1[4]), l1[4], l1[5], l1[5]+1.5*(l1[6]-l1[4])]
x2 = [l2[4]-1.5*(l2[6]-l2[4]), l2[4], l2[5], l2[5]+1.5*(l2[6]-l2[4])]

fig = plt.figure(figsize=(8,6))

plt.boxplot([x for x in [x1, x2]], 0, 'rs', 1)
plt.xticks([y+1 for y in range(len([x1, x2]))], ['x1', 'x2'])
plt.xlabel('measurement x')
t = plt.title('Box plot')
plt.show()

enter image description here

Upvotes: 0

Related Questions