Mari
Mari

Reputation: 53

Ggplot2: unique() does not work properly with dplyr piping

I have some problems with the unique() function when piping with dplyr. With my simple example code this works fine:


category <- as.factor(c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4))
quality <- as.factor(c(0, 1, 2, 3, 3, 0, 0, 1, 3, 2, 2, 2, 1, 0, 3, 2, 3, 3, 1, 0, 2, 1))
mydata <- data.frame(category, quality)

This adjusts my dataframe so that it is easier to work with and produce a nice plot:

mydata2 <- mydata %>% 
  group_by(category, quality) %>% 
  mutate(count_q = n()) %>% 
  ungroup() %>%
  group_by(category) %>% 
  mutate(tot_q = n(),pc = count_q*100 / tot_q) %>% 
  unique() %>% 
  arrange(category)

myplot <- ggplot(mydata2, aes(x = category, y = pc, fill = quality)) +
  geom_col() +
  geom_text(aes(
    x = category,
    y = pc,
    label = round(pc,digits = 1),
    group = quality),
    position = position_stack(vjust = .5)) +
  ggtitle("test") +
  xlab("cat") +
  ylab("%") +
  labs("quality")

myplot

Looks exactly like I want:

enter image description here

However, with my actual data the same code produces this mess:

enter image description here

I did find a solution: when I add this line and use the new mydata.unique as the basis for my ggplot, it works exactly like with my example data. This is not needed in the example data for some reason, whereas in my actual data the unique() within piping seems to do nothing.

mydata.unique <- unique(mydata2[c("quality","category", "count_q", "tot_q", "pc")])

What I don't understand is why I need to add the above line. Obviously I can't share my actual data. Maybe someone still understands what this is about. Maybe it has to do with other (irrelevant) columns in the data that can't be processed by unique()?

Upvotes: 0

Views: 134

Answers (2)

adalvarez
adalvarez

Reputation: 43

Try with distinct() instead of unique(). And in this case, probably you need to summarise instead of mutate() + distinct()

Upvotes: 2

Ahorn
Ahorn

Reputation: 3876

If your original df has more variables, try this:

mydata2 <- mydata %>% 
  group_by(category, quality) %>% 
  mutate(count_q = n()) %>% 
  ungroup() %>%
  group_by(category) %>% 
  mutate(tot_q = n(),pc = count_q*100 / tot_q) %>% 
  distinct(category, quality, count_q, tot_q, pc, .keep_all = TRUE) %>% 
  arrange(category)

Or maybe as mentioned by @adalvarez replace mutate with summarise.

Upvotes: 1

Related Questions