Reputation: 73
I have data (shown below) that I want to remove the outliers of. I want to remove all observations that lie outside of the 1st and 99th percentile. The problem is that there are a lot of NA observations. I want to keep those NA observations.
combined
date change_cds
<date> <dbl>
2005-12-31 -2.5975486
2005-11-30 -1.5873349
2005-11-30 NA
2005-11-30 NA
2005-11-30 -31.7240875
2005-12-31 -8.7011377
2005-12-31 9.5310180
2005-12-31 -18.9242000
2005-12-31 -3.8466281
2005-12-31 5.7158414
2005-11-30 13.0053128
2005-11-30 10.2129495
2005-11-30 NA
2005-11-30 -13.9152604
2005-11-30 -9.1434206
Previously I had this code which worked when there were no NA
in the dataset:
combined <- combined %>%
filter(change_cds < quantile(combined$change_cds, (1-0.01)) & change_cds > quantile(combined$change_cds, 0.01))
However after NA
is introduced the code falls apart. I want to exclude all observations that lie outside the 1st and 99th percentile exclusive of the NA
observations. But I want to keep all the rows that have NA
variables.
Thanks in advance.
Upvotes: 0
Views: 437
Reputation: 389135
You can use -
library(dplyr)
combined <- combined %>%
filter(change_cds < quantile(change_cds, 0.99, na.rm = TRUE) &
change_cds > quantile(change_cds, 0.01, na.rm = TRUE) | is.na(change_cds)))
In base R -
combined <- subset(combined, change_cds < quantile(change_cds, 0.99, na.rm = TRUE) &
change_cds > quantile(change_cds, 0.01, na.rm = TRUE) | is.na(change_cds))
Upvotes: 1