Reputation: 1558
I have a large dataframe (≈ 2M observations) that have many duplicates. I am going to delete those duplicates, but I need to keep the non-duplicate values as conditioned on another value that is not missing (NA). It can be any value imaginable, as long as there is a non-NA. For example:
data <- airquality
data[4:10,3] <- rep(NA,7)
data[1:5,4] <- NA
library(dplyr)
new.data <- data %>%
group_by(Ozone) %>%
filter(Wind==????))
Here you can see I am not sure what to filter by as annotated by the "Wind==????". As long as any value (numeric or nominal) is in the Wind column , then I would like to keep these unique values, while deleting the conditional values on non-missing values.
Thank you!
Upvotes: 3
Views: 907
Reputation: 886938
We can do
data %>%
group_by(Ozone) %>%
filter(!duplicated(Wind) & !is.na(Wind))
Upvotes: 3