arijeet
arijeet

Reputation: 31

replacing specific values with NA using na_if

My dataframe has few values as 9.969210e+36, I want to replace them with NA. It looks like

# A tibble: 1,308 x 3
       IMD     CRU dts       
     <dbl>   <dbl> <date>    
 1 9.97e36 9.97e36 1901-01-01
 2 9.97e36 9.97e36 1902-01-01
 3 9.97e36 9.97e36 1903-01-01
 4 9.97e36 9.97e36 1904-01-01 

dput(head(df))

structure(list(IMD = c(9.96920996838687e+36, 9.96920996838687e+36, 
9.96920996838687e+36, 9.96920996838687e+36, 9.96920996838687e+36, 
9.96920996838687e+36), CRU = c(9.96920996838687e+36, 9.96920996838687e+36, 
9.96920996838687e+36, 9.96920996838687e+36, 9.96920996838687e+36, 
9.96920996838687e+36), dts = structure(c(-25202, -24837, -24472, 
-24107, -23741, -23376), class = "Date")), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -6L))

I followed R - Replace specific value contents with NA as

df %>% mutate_at(vars(IMD, CRU), na_if, 9.969210e+36)
df %>% na_if(x=as.vector(df$IMD),y=9.97e36)

None of the above shows NA values and returns same old dataframe. Any help is appreciated

Upvotes: 0

Views: 2319

Answers (1)

MrGumble
MrGumble

Reputation: 5766

na_if works on vectors, not data.frame, thus your first attempt using mutate would be most correct. Furthermore, it compares exact values to replace with NA. However, your very large values are only displayed with 15 digits; I suspect there are a lot, lot more. Therefore, no values are matched exactly to your conditional (y). This is a common problem when trying to exactly compare to real values.

Also note that you are trying to compare the two values. Which is largest?

9.969210e+36
9.96920996838687e+36

You can do it quickly by:

df %>%> mutate(
  IMD=ifelse(IMD > 9e36, NA, IMD),
  CRU=ifelse(CRU > 9e36, NA, CRU)
)

or create a function as,

na_when_larger <- function(x, y) {
  x[x > y] <- NA
  x
}

df %>% mutate_at(vars(IMD, CRU), na_when_larger, 9.96e+36)

(try typing na_if into the console without parenthesis).

Upvotes: 1

Related Questions