How to not include NA observations in grouping when using group_by() followed by summarize() with dplyr?

Question

I have a dataframe of phone numbers, emails and names. Some emails are duplicated, with different name spellings. I don't really care about which name remains, so I am grouping by email, and summarizing to choose first observation of name and phone numbers. However, there are some missing email addresses, but I want to keep them from grouping together so that I can keep the unique phone numbers. Using a simplified example, my data is:

data <- data.frame(x=c(1,2,3,4,5,5,5,6), y=c("a","b","c",NA,"d","d","d",NA))
data %>% group_by(y) %>% summarize(x=first(x))

I lose the number 6 when I do this. How do I keep the NAs from grouping together and being summarized?

Ronak Shah · Accepted Answer

Probably handle NAs separately and bind them to original data.

library(dplyr)

data %>%
  filter(!is.na(y)) %>%
  group_by(y) %>%
  summarize(x=first(x)) %>%
  bind_rows(data %>% filter(is.na(y)))

# A tibble: 6 x 2
#  y         x
#   
#1 a         1
#2 b         2
#3 c         3
#4 d         5
#5 NA        4
#6 NA        6

How to not include NA observations in grouping when using group_by() followed by summarize() with dplyr?

Answers (1)

Related Questions