Reputation: 1314
I have a set of lists (about 100) of the form [6, 17, 5, 1, 4, 7, 14, 19, 0, 10]
and I want to get one box plot which plots the averages of box-plot information (i.e. median, max, min, Q1, Q3, outliers) of all of the lists.
For example, if I have 2 lists
l1 = [6, 17, 5, 1, 4, 7, 14, 19, 0, 10]
l2 = [4, 12, 3, 5, 16, 0, 14, 7, 8, 15]
I can get averages of max, median, and min of the lists as follows:
maxs = np.array([])
mins = np.array([])
medians = np.array([])
for l in [l1, l2]:
medians = np.append(medians, np.median(l))
maxs = np.append(maxs, np.max(l))
mins = np.append(mins, np.min(l))
averMax = np.mean(maxs)
averMin = np.mean(mins)
averMedian = np.mean(medians)
I should do the same for other info in the box plot such as average Q1, average Q3. I then need to use this information (averMax, averMin, etc.) to plot just one single box plot (not multiple box plots in one graph).
I know from Draw Box-Plot with matplotlib that you don't have to calculate the values for a normal box plot. You just need to specify the data as a variable. Is it possible to do the same for my case instead of manually calculating the averages of the values of all the lists?
Upvotes: 0
Views: 838
Reputation: 35230
pd.describe()
will get the quartiles, so you can make a graph based on them. I customized the calculated numbers with the help of this answer and the example graph from the official reference.
import pandas as pd
import numpy as np
import io
l1 = [6, 17, 5, 1, 4, 7, 14, 19, 0, 10]
l2 = [4, 12, 3, 5, 16, 0, 14, 7, 8, 15]
df = pd.DataFrame({'l1':l1, 'l2':l2}, index=np.arange(len(l1)))
df.describe()
l1 l2
count 10.000000 10.000000
mean 8.300000 8.400000
std 6.532823 5.561774
min 0.000000 0.000000
25% 4.250000 4.250000
50% 6.500000 7.500000
75% 13.000000 13.500000
max 19.000000 16.000000
import matplotlib.pyplot as plt
# spread,center, filer_high, flier_low
x1 = [l1[4]-1.5*(l1[6]-l1[4]), l1[4], l1[5], l1[5]+1.5*(l1[6]-l1[4])]
x2 = [l2[4]-1.5*(l2[6]-l2[4]), l2[4], l2[5], l2[5]+1.5*(l2[6]-l2[4])]
fig = plt.figure(figsize=(8,6))
plt.boxplot([x for x in [x1, x2]], 0, 'rs', 1)
plt.xticks([y+1 for y in range(len([x1, x2]))], ['x1', 'x2'])
plt.xlabel('measurement x')
t = plt.title('Box plot')
plt.show()
Upvotes: 0