Reputation: 53
I have some problems with the unique() function when piping with dplyr. With my simple example code this works fine:
category <- as.factor(c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4))
quality <- as.factor(c(0, 1, 2, 3, 3, 0, 0, 1, 3, 2, 2, 2, 1, 0, 3, 2, 3, 3, 1, 0, 2, 1))
mydata <- data.frame(category, quality)
This adjusts my dataframe so that it is easier to work with and produce a nice plot:
mydata2 <- mydata %>%
group_by(category, quality) %>%
mutate(count_q = n()) %>%
ungroup() %>%
group_by(category) %>%
mutate(tot_q = n(),pc = count_q*100 / tot_q) %>%
unique() %>%
arrange(category)
myplot <- ggplot(mydata2, aes(x = category, y = pc, fill = quality)) +
geom_col() +
geom_text(aes(
x = category,
y = pc,
label = round(pc,digits = 1),
group = quality),
position = position_stack(vjust = .5)) +
ggtitle("test") +
xlab("cat") +
ylab("%") +
labs("quality")
myplot
Looks exactly like I want:
However, with my actual data the same code produces this mess:
I did find a solution: when I add this line and use the new mydata.unique
as the basis for my ggplot, it works exactly like with my example data. This is not needed in the example data for some reason, whereas in my actual data the unique()
within piping seems to do nothing.
mydata.unique <- unique(mydata2[c("quality","category", "count_q", "tot_q", "pc")])
What I don't understand is why I need to add the above line. Obviously I can't share my actual data. Maybe someone still understands what this is about. Maybe it has to do with other (irrelevant) columns in the data that can't be processed by unique()
?
Upvotes: 0
Views: 134
Reputation: 43
Try with distinct()
instead of unique()
. And in this case, probably you need to summarise instead of mutate() + distinct()
Upvotes: 2
Reputation: 3876
If your original df has more variables, try this:
mydata2 <- mydata %>%
group_by(category, quality) %>%
mutate(count_q = n()) %>%
ungroup() %>%
group_by(category) %>%
mutate(tot_q = n(),pc = count_q*100 / tot_q) %>%
distinct(category, quality, count_q, tot_q, pc, .keep_all = TRUE) %>%
arrange(category)
Or maybe as mentioned by @adalvarez replace mutate
with summarise
.
Upvotes: 1