Group By Non-Missing Values Dplyr

Question

I have a large dataframe (≈ 2M observations) that have many duplicates. I am going to delete those duplicates, but I need to keep the non-duplicate values as conditioned on another value that is not missing (NA). It can be any value imaginable, as long as there is a non-NA. For example:

 data <- airquality
 data[4:10,3] <- rep(NA,7)
 data[1:5,4] <- NA

 library(dplyr)

 new.data <- data %>% 
    group_by(Ozone) %>% 
    filter(Wind==????))

Here you can see I am not sure what to filter by as annotated by the "Wind==????". As long as any value (numeric or nominal) is in the Wind column , then I would like to keep these unique values, while deleting the conditional values on non-missing values.

Thank you!

akrun · Accepted Answer

We can do

data %>% 
     group_by(Ozone) %>%
     filter(!duplicated(Wind) & !is.na(Wind))

Group By Non-Missing Values Dplyr

Answers (1)

Related Questions