Reputation: 1013
I am trying to consolidate duplicate data that looks like the following:
a b c
1 1 1 1
2 2 2 2
3 1 1 NA
4 4 4 4
5 5 5 NA
6 2 2 2
In this case I want all distinct rows based on a and b and if multiple exist delete the one where c is NA. For example if I run the following:
distinct(df,a,b,c) %>% group_by(a,b) %>% filter(!is.na(c))
I will get what I need except I lose row 5. If the group consist of only 1 row or all the entries in the group are NAs then I lose those records. I am not sure what else I can do to avoid this issue. The dummy dataset can be generated with the following line:
df<-data.frame(a=c(1,2,1,4,5,2),b=c(1,2,1,4,5,2),c=c(1,2,NA,4,NA,2))
Upvotes: 1
Views: 245
Reputation: 66819
You can add an exception to the rule !is.na(c)
with the "or" operator |
:
distinct(df,a,b,c) %>% group_by(a,b) %>% filter( !is.na(c) | n() == 1 )
So it's kept if either (1) non-NA
or (2) the only row.
Upvotes: 3