John
John

Reputation: 1947

Some boxplot improvements

I have one problem which I'm not able to solve. Let's say I have the code following :

x<-rnorm(100)
x<-append(x,5)
ggplot()+geom_boxplot(aes(y=x),width=3)+scale_y_continuous(breaks=round(c(median(x),summary(x)[['1st Qu.']],summary(x)[['3rd Qu.']],max(x),min(x)),digits=2))

enter image description here

As you can see above, that boxplot is huge! How can I change it's width ? As you can see above, I tried to do it by command width but it didn't work (which is very strange to me, because I read some posts about that and the width command was working.

The second thing I want to do is to mark values which are relevant to boxplot (as median, 1st quantile, 3rd quantile, max,min) but I want to have always top of the stick marked (max() for instance will mark outlier if exist). As you can see above, outlier is marked by max() function, but top of the upper stick is not marked. Is there any way how can we do it ? I tried to do it by adding to breaks=() command :

 ifelse(max(x)>(3/2*summary(x)[['3rd Qu.']]),max(x[x<3/2*summary(x)[['3rd Qu.']]]),"")

But it's not working. My intuition was : If outlier exist (condition) then mark top of the stick which is the nearest value to 3/2* 3rd quantile. Is there any way how can we solve those two issues ?

Upvotes: 0

Views: 48

Answers (1)

det
det

Reputation: 5232

You can try:

set.seed(123)

x <- rnorm(100)
x <- append(x,5)

my_breaks <- summary(x)[-4] %>% as.numeric()
my_range <- my_breaks[c(2,4)] + (my_breaks[c(2,4)] %>% diff() * 3/2) * c(-1, 1)
my_breaks <- c(my_breaks, x[x >= my_range[1] & x <= my_range[2]] %>% range())

data <- data.frame(group = factor(0), y = x)

ggplot(data)+
  geom_boxplot(aes(group, y), width = 0.25) +
  labs(x = NULL) +
  scale_y_continuous(breaks = round(my_breaks, 2)) +
  theme(
    axis.title.x=element_blank(),
    axis.text.x=element_blank(),
    axis.ticks.x=element_blank()
  )

enter image description here

Upvotes: 2

Related Questions