Reputation: 57
I have a dataframe of phone numbers, emails and names. Some emails are duplicated, with different name spellings. I don't really care about which name remains, so I am grouping by email, and summarizing to choose first observation of name and phone numbers. However, there are some missing email addresses, but I want to keep them from grouping together so that I can keep the unique phone numbers. Using a simplified example, my data is:
data <- data.frame(x=c(1,2,3,4,5,5,5,6), y=c("a","b","c",NA,"d","d","d",NA))
data %>% group_by(y) %>% summarize(x=first(x))
I lose the number 6 when I do this. How do I keep the NAs from grouping together and being summarized?
Upvotes: 2
Views: 496
Reputation: 389235
Probably handle NA
s separately and bind them to original data.
library(dplyr)
data %>%
filter(!is.na(y)) %>%
group_by(y) %>%
summarize(x=first(x)) %>%
bind_rows(data %>% filter(is.na(y)))
# A tibble: 6 x 2
# y x
# <fct> <dbl>
#1 a 1
#2 b 2
#3 c 3
#4 d 5
#5 NA 4
#6 NA 6
Upvotes: 3