dan
dan

Reputation: 6304

Remove whiskers and outliers in R plotly

I have continuous data that I'd like to plot using R's plotly with a box or violin plot without the outliers and whiskers:

set.seed(1)
df <- data.frame(group=c(rep("g1",500),rep("g2",700),rep("g3",600)),
                 value=c(c(rep(0,490),runif(10,10,15)),abs(rnorm(700,1,10)),c(rep(0,590),runif(10,10,15))),
                 stringsAsFactors = F)
df$group <- factor(df$group, levels = c("g1","g2","g3"))

I know how to remove outliers in plotly:

plotly::plot_ly(x = df$group, y =df$value, type = 'box', color = df$group, boxpoints = F, showlegend = F)

enter image description here

But I'm still left with the whiskers.

I tried using ggplot2 for that (also limiting the height of the y-axis to that of the 75 percentile):

library(ggplot2)
gp <- ggplot(df, aes(group, value, color = group, fill = group)) + geom_boxplot(outlier.shape = NA, coef = 0) +
  scale_y_continuous(limits = c(0, ceiling(max(dplyr::summarise(dplyr::group_by(df, group), tile = quantile(value, probs = 0.75))$tile)))) +
  theme_minimal() + theme(legend.position = "none",axis.title = element_blank())

enter image description here

But then trying to convert that to a plotly object doesn't maintain that:

plotly::ggplotly(gp)

enter image description here

Any idea?

Upvotes: 3

Views: 1308

Answers (1)

Kat
Kat

Reputation: 18714

This is a workaround.

I changed your plot a bit, first.

# box without outliers
p <- plot_ly(df, x = ~group, y = ~value, type = 'box', 
        color = ~group, boxpoints = F, showlegend = F,
        whiskerwidth = 0, line = list(width = 0)) # no whisker, max or min line

Then I add the medians back to the graph. This requires calculating the medians, matching the colors, and creating the shape lists for Plotly.

For the colors, it's odd, the first three default colors are used, but the order is g3, g2, g1...

# the medians
res = df %>% group_by(group) %>% 
  summarise(med = median(value))

# default color list: https://community.plotly.com/t/plotly-colours-list/11730/2
col = rev(c('#1f77b4', '#ff7f0e', '#2ca02c')) # the plot is colored 3, 2, 1

# discrete x-axis; domain default [0, 1]
# default box margin = .08, three groups, each get 1/3 of space
details <- function(col){ # need everytime basics
  list(type = 'line',
       line = list(color = col, width = 4),
       xref = "paper", yref = "y")
}

# horizontal segments/ median
segs = lapply(1:nrow(res),
              function(k){
                x1 <- k/3 - .08  # if the domain is [0, 1]
                x0 <- (k - 1)/3 + .08
                y0 <- y1 <- res[k, ]$med
                line = list("x0" = x0, "x1" = x1,
                            "y0" = y0, "y1" = y1)
                deets = details(col[k])
                c(deets, line)
              })

Finally, I added them back onto the plot.

p %>% layout(shapes = segs)

I made the lines obnoxiously wide, but you get the idea.

enter image description here

If you wanted the IQR outline back, you could do this, as well. I used functions here, as well. I figured that the data you've provided is not the actual data, so the function will serve a purpose.

# include IQR outline
res2 = df %>% group_by(group) %>% 
  summarise(q1 = setNames(quantile(value, type = 7, 1/4), NULL),
            q3 = setNames(quantile(value, type = 7, 3/4), NULL),
            med = median(value))

# IQR segments 
rects = lapply(1:nrow(res2),         # if the domain is [0, 1]
               function(k){
                 x1 <- k/3 - .08
                 x0 <- (k - 1)/3 + .08
                 y0 <- res2[k, ]$q1
                 y1 <- res2[k, ]$q3
                 line = list(color = col[k], width = 4)
                 rect = list("x0" = x0, "x1" = x1,
                             "y0" = y0, "y1" = y1,
                             type = "rect", xref = "paper",
                             yref = "y", "line" = line)
                 rect
               })

rects = append(segs, rects)
p %>% layout(shapes = rects)

enter image description here

Upvotes: 3

Related Questions