Dominique Makowski
Dominique Makowski

Reputation: 1673

Visualise differences between factor levels using ggplot

I have a plot in my mind that I would like to create, but I don't know how to successfully achieve this goal.

I have 2 dataframes, one containing the mean value for each factor level, and the other, pairwise differences between these levels.

contrasts <- data.frame(Level1 = c("setosa", "setosa", "versicolor"),
                        Level2 = c("versicolor", "virginica", "virginica"),
                        Diff = c(0.65, 0.46, -0.20),
                        CI_low = c(0.53, 0.35, -0.32),
                        CI_high = c(0.75, 0.56, -0.09))

means <- data.frame(Species = c("setosa", "versicolor", "virginica"),
                    Mean = c(3.42, .77, 2.97))

My goal is to use the means as starting point for a triangle that would "project" onto the level of the corresponding contrast, which height would be equal to the CI (CI_low and CI_high). So that it would look something like that (pardon my paint):

enter image description here

Using the following, I easily added the initial points:

library(tidyverse)

means %>%
  ggplot() + 
  geom_point(aes(x = Species, y= Mean)) + 
  geom_ribbon(data=contrasts, aes(x=Level1, ymin=CI_low, ymax=CI_high))

But I have troubles with adding the triangles. Any ideas? Thanks a lot!

Edit

Thanks to Yuriy Barvinchenko, which provided the code to obtain this:

contrasts %>% 
  bind_cols(id=1:3) %>% 
  inner_join(means, by=c('Level1' = 'Species')) %>% 
  select(id, x=Level1, y=Mean) %>% 
  bind_rows( (contrasts %>% 
                bind_cols(id=1:3) %>% 
                select(id, x=Level2, y=CI_low)),
             (contrasts %>% 
                bind_cols(id=1:3) %>% 
                select(id, x=Level2, y=CI_high))) %>% 
  ggplot(aes(x = x, y= y, group=id)) + 
  geom_polygon()

However, based on the means, I would have expected the middle-level (versicolor) to be the "lowest", whereas in that plot it is virginica which as the lowest value.

Upvotes: 2

Views: 189

Answers (1)

Yuriy Barvinchenko
Yuriy Barvinchenko

Reputation: 1595

if I understand your question correctly, you need code like this:

contrasts <- tibble(Level1 = c("setosa", "setosa", "versicolor"),
                        Level2 = c("versicolor", "virginica", "virginica"),
                        Diff = c(0.65, 0.46, -0.20),
                        CI_low = c(0.53, 0.35, -0.32),
                        CI_high = c(0.75, 0.56, -0.09))

means <- tibble(Species = c("setosa", "versicolor", "virginica"),
                                            Mean = c(3.42, .77, 2.97))

library(tidyverse)

contrasts %>% 
  bind_cols(id=1:3) %>% 
  inner_join(means, by=c('Level1' = 'Species')) %>% 
  select(id, x=Level1, y=Mean) %>% 
  bind_rows( (contrasts %>% 
                bind_cols(id=1:3) %>% 
                select(id, x=Level2, y=CI_low)),
             (contrasts %>% 
                bind_cols(id=1:3) %>% 
                select(id, x=Level2, y=CI_high))) %>% 
  ggplot(aes(x = x, y= y, group=id)) + 
  geom_polygon()

Please note, I use tibble() instead of data.frame() in order to avoid factors, for easier joining these tables.

Upvotes: 3

Related Questions