kvjing
kvjing

Reputation: 83

how to remove NA in r

I want to separate the hashtags in one column into different columns. After I use "separate" function, I have a lot of NA's when I do a ggplot. How can I remove the NA's in my ggplot? my code is like this:

df %>% 
  separate(terms, into = paste0("t", 1:5), sep = ";") %>% 
  pivot_longer(-year) %>% 
  group_by(year, value) %>% 
  count(value) %>% 
  ggplot(aes(x = factor(year), y = n, fill = value, label = NA)) +
  geom_col(position = position_dodge()) +
  geom_text(position = position_dodge(1))

my data is like this:

    terms     year
1   #A;#B;#C;#D;E 2017
2   #B;#C;#D     2016
3   #C;#D;#E#G    2021
4   #D;#E;#F     2020

...

Upvotes: 0

Views: 137

Answers (1)

Ian Campbell
Ian Campbell

Reputation: 24790

Try tidyr::separate_rows instead:

library(tidyverse)
df %>%
  separate_rows(terms, sep = ";") %>%
  group_by(year, terms) %>% 
  count(terms) %>% 
ggplot(aes(x = factor(year), y = n, fill = terms, label = NA)) +
  geom_col(position = position_dodge()) +
  geom_text(aes(label = terms), position = position_dodge(1))

enter image description here

You might also want to include tidyr::complete:

df %>%
  separate_rows(terms, sep = ";") %>%
  group_by(year, terms) %>% 
  count(terms) %>% 
  ungroup() %>%
  complete(year, terms, fill = list(n = 0)) %>%
ggplot(aes(x = factor(year), y = n, fill = terms, label = NA)) +
  geom_col(position = position_dodge(preserve = "single")) +
  scale_fill_discrete(drop = FALSE) +
  scale_x_discrete(drop = FALSE) +
  geom_text(aes(label = n), size = 3, position = position_dodge(width = 1))

enter image description here

Or with only the top 3 terms labeled:

df %>%
  separate_rows(terms, sep = ";") %>%
  group_by(year, terms) %>% 
  count(terms) %>% 
  ungroup() %>%
  complete(year, terms, fill = list(n = 0))  -> new_df

ggplot(new_df, aes(x = factor(year), y = n, fill = terms, label = NA)) +
  geom_col(position = position_dodge(preserve = "single")) +
  scale_fill_discrete(drop = FALSE) +
  scale_x_discrete(drop = FALSE) +
  geom_text(data = new_df %>%
              group_by(year) %>%
              mutate(n = case_when(rank(-n,ties.method = "random") <= 3 ~ n,
                                   TRUE ~ NA_real_)),
            aes(label = terms), size = 3, position = position_dodge(width = 1))

enter image description here

Sample Data:

df <- structure(list(terms = c("#A;#B;#C;#D;#E", "#C;#D;#E", "#B;#C;#D", 
"#A", "#C;#D;#E;#G", "#D;#E;#F", "#D"), year = c(2017L, 2017L, 
2016L, 2016L, 2021L, 2020L, 2020L)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7"))

Upvotes: 1

Related Questions