Reputation: 33
I would like to plot boxplots of dataframes (see sample code below). What I'm wondering is: How can I disable the detection of outlier? I don't want to remove them, I just want a plot which visualizes the data by marking 0%, 25%, 50% and 75% of the datapoints without considering any criteria for outliers etc.
How do I have to modify my code to achieve this? Can I change the outlier detection criteria in a way that it behaves like disabled?
I would be very grateful for any help and if there is already another threat about this (which I didn't find), I would be happy to get a link to it.
Many thanks! Jordin
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
np.random.seed(1234)
df = pd.DataFrame(np.random.randn(10, 4),
columns=['Col1', 'Col2', 'Col3', 'Col4'])
plt.figure()
plt.boxplot(df.values)
plt.show()
EDIT:
I would like to include this outlier when drawing the whiskers and not just not show it.
Upvotes: 3
Views: 2225
Reputation: 4125
You're looking for the whis
parameter.
For the documentation:
whis : float, sequence, or string (default = 1.5)
As a float, determines the reach of the whiskers to the beyond the first and third quartiles. In other words, where IQR is the interquartile range (Q3-Q1), the upper whisker will extend to last datum less than Q3 + whisIQR). Similarly, the lower whisker will extend to the first datum greater than Q1 - whisIQR. Beyond the whiskers, data are considered outliers and are plotted as individual points. Set this to an unreasonably high value to force the whiskers to show the min and max values. Alternatively, set this to an ascending sequence of percentile (e.g., [5, 95]) to set the whiskers at specific percentiles of the data. Finally, whis can be the string 'range' to force the whiskers to the min and max of the data.
Add it like so:
df.boxplot(whis=99)
Upvotes: 1
Reputation: 332
If you add sym=''
inside your plot function I think you will get what you ask for:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
np.random.seed(1234)
df = pd.DataFrame(np.random.randn(10, 4),
columns=['Col1', 'Col2', 'Col3', 'Col4'])
df.boxplot(sym='')
Upvotes: 1