jordin1987
jordin1987

Reputation: 33

No outlier detection in boxplot

I would like to plot boxplots of dataframes (see sample code below). What I'm wondering is: How can I disable the detection of outlier? I don't want to remove them, I just want a plot which visualizes the data by marking 0%, 25%, 50% and 75% of the datapoints without considering any criteria for outliers etc.

How do I have to modify my code to achieve this? Can I change the outlier detection criteria in a way that it behaves like disabled?

I would be very grateful for any help and if there is already another threat about this (which I didn't find), I would be happy to get a link to it.

Many thanks! Jordin

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

np.random.seed(1234)
df = pd.DataFrame(np.random.randn(10, 4),
                  columns=['Col1', 'Col2', 'Col3', 'Col4'])

plt.figure()
plt.boxplot(df.values)
plt.show()

EDIT:

The outlier on the top right side is marked as an outlier

I would like to include this outlier when drawing the whiskers and not just not show it.

Upvotes: 3

Views: 2225

Answers (2)

Mitchell van Zuylen
Mitchell van Zuylen

Reputation: 4125

You're looking for the whis parameter.

For the documentation:

whis : float, sequence, or string (default = 1.5)

As a float, determines the reach of the whiskers to the beyond the first and third quartiles. In other words, where IQR is the interquartile range (Q3-Q1), the upper whisker will extend to last datum less than Q3 + whisIQR). Similarly, the lower whisker will extend to the first datum greater than Q1 - whisIQR. Beyond the whiskers, data are considered outliers and are plotted as individual points. Set this to an unreasonably high value to force the whiskers to show the min and max values. Alternatively, set this to an ascending sequence of percentile (e.g., [5, 95]) to set the whiskers at specific percentiles of the data. Finally, whis can be the string 'range' to force the whiskers to the min and max of the data.

Add it like so:

df.boxplot(whis=99)

Upvotes: 1

bjornsing
bjornsing

Reputation: 332

If you add sym='' inside your plot function I think you will get what you ask for:

boxplot

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

np.random.seed(1234)
df = pd.DataFrame(np.random.randn(10, 4),
                  columns=['Col1', 'Col2', 'Col3', 'Col4'])

df.boxplot(sym='')

Upvotes: 1

Related Questions