Reputation: 83
I want to separate the hashtags in one column into different columns. After I use "separate" function, I have a lot of NA's when I do a ggplot. How can I remove the NA's in my ggplot? my code is like this:
df %>%
separate(terms, into = paste0("t", 1:5), sep = ";") %>%
pivot_longer(-year) %>%
group_by(year, value) %>%
count(value) %>%
ggplot(aes(x = factor(year), y = n, fill = value, label = NA)) +
geom_col(position = position_dodge()) +
geom_text(position = position_dodge(1))
my data is like this:
terms year
1 #A;#B;#C;#D;E 2017
2 #B;#C;#D 2016
3 #C;#D;#E#G 2021
4 #D;#E;#F 2020
...
Upvotes: 0
Views: 137
Reputation: 24790
Try tidyr::separate_rows
instead:
library(tidyverse)
df %>%
separate_rows(terms, sep = ";") %>%
group_by(year, terms) %>%
count(terms) %>%
ggplot(aes(x = factor(year), y = n, fill = terms, label = NA)) +
geom_col(position = position_dodge()) +
geom_text(aes(label = terms), position = position_dodge(1))
You might also want to include tidyr::complete
:
df %>%
separate_rows(terms, sep = ";") %>%
group_by(year, terms) %>%
count(terms) %>%
ungroup() %>%
complete(year, terms, fill = list(n = 0)) %>%
ggplot(aes(x = factor(year), y = n, fill = terms, label = NA)) +
geom_col(position = position_dodge(preserve = "single")) +
scale_fill_discrete(drop = FALSE) +
scale_x_discrete(drop = FALSE) +
geom_text(aes(label = n), size = 3, position = position_dodge(width = 1))
Or with only the top 3 terms labeled:
df %>%
separate_rows(terms, sep = ";") %>%
group_by(year, terms) %>%
count(terms) %>%
ungroup() %>%
complete(year, terms, fill = list(n = 0)) -> new_df
ggplot(new_df, aes(x = factor(year), y = n, fill = terms, label = NA)) +
geom_col(position = position_dodge(preserve = "single")) +
scale_fill_discrete(drop = FALSE) +
scale_x_discrete(drop = FALSE) +
geom_text(data = new_df %>%
group_by(year) %>%
mutate(n = case_when(rank(-n,ties.method = "random") <= 3 ~ n,
TRUE ~ NA_real_)),
aes(label = terms), size = 3, position = position_dodge(width = 1))
Sample Data:
df <- structure(list(terms = c("#A;#B;#C;#D;#E", "#C;#D;#E", "#B;#C;#D",
"#A", "#C;#D;#E;#G", "#D;#E;#F", "#D"), year = c(2017L, 2017L,
2016L, 2016L, 2021L, 2020L, 2020L)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7"))
Upvotes: 1