Ivan
Ivan

Reputation: 73

How to remove outliers but keep NA

I have data (shown below) that I want to remove the outliers of. I want to remove all observations that lie outside of the 1st and 99th percentile. The problem is that there are a lot of NA observations. I want to keep those NA observations.

combined

date        change_cds
<date>      <dbl>
2005-12-31  -2.5975486          
2005-11-30  -1.5873349          
2005-11-30  NA          
2005-11-30  NA          
2005-11-30  -31.7240875         
2005-12-31  -8.7011377          
2005-12-31  9.5310180           
2005-12-31  -18.9242000         
2005-12-31  -3.8466281          
2005-12-31  5.7158414
2005-11-30  13.0053128          
2005-11-30  10.2129495          
2005-11-30  NA          
2005-11-30  -13.9152604         
2005-11-30  -9.1434206  

Previously I had this code which worked when there were no NA in the dataset:

combined <- combined %>%
  filter(change_cds < quantile(combined$change_cds, (1-0.01)) & change_cds > quantile(combined$change_cds, 0.01))

However after NA is introduced the code falls apart. I want to exclude all observations that lie outside the 1st and 99th percentile exclusive of the NA observations. But I want to keep all the rows that have NA variables.

Thanks in advance.

Upvotes: 0

Views: 437

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389135

You can use -

library(dplyr)

combined <- combined %>%
  filter(change_cds < quantile(change_cds, 0.99, na.rm = TRUE) & 
     change_cds > quantile(change_cds, 0.01, na.rm = TRUE) | is.na(change_cds)))

In base R -

combined <- subset(combined, change_cds < quantile(change_cds, 0.99, na.rm = TRUE) & 
        change_cds > quantile(change_cds, 0.01, na.rm = TRUE) | is.na(change_cds))

Upvotes: 1

Related Questions