rjss
rjss

Reputation: 1013

Consolidate "duplicate" rows in R

I am trying to consolidate duplicate data that looks like the following:

  a b  c
1 1 1  1
2 2 2  2
3 1 1 NA
4 4 4  4
5 5 5 NA
6 2 2  2

In this case I want all distinct rows based on a and b and if multiple exist delete the one where c is NA. For example if I run the following:

distinct(df,a,b,c) %>% group_by(a,b) %>% filter(!is.na(c))

I will get what I need except I lose row 5. If the group consist of only 1 row or all the entries in the group are NAs then I lose those records. I am not sure what else I can do to avoid this issue. The dummy dataset can be generated with the following line:

df<-data.frame(a=c(1,2,1,4,5,2),b=c(1,2,1,4,5,2),c=c(1,2,NA,4,NA,2))

Upvotes: 1

Views: 245

Answers (1)

Frank
Frank

Reputation: 66819

You can add an exception to the rule !is.na(c) with the "or" operator |:

distinct(df,a,b,c) %>% group_by(a,b) %>% filter( !is.na(c) | n() == 1 )

So it's kept if either (1) non-NA or (2) the only row.

Upvotes: 3

Related Questions