stats_noob
stats_noob

Reputation: 5907

R: replacing <NA> within factor variables as 0

I am working with the R programming language. I have a dataset with both character and numeric variables - I am trying to replace all NA's and empty values in this data with "0". For a continuous variable, the NA/empty value should be replaced with a "numeric 0". For factor variables, the NA/empty value should be replaced with a "factor 0".

In the past, I used to use a standard command for replacing all NA's with 0 (in the below code, "df" represents the data frame containing the data):

df[df == NA] <- 0

I tried the above code on my data, but I still noticed that within the factor variables, this code was not able to replace <NA> values with 0. <NA> 's are still present.

I tried several approaches:

1st Approach:

df[is.na(df)] <- 0

But this did not work:

Warning message: 
In '[<-.factor'('*tmp*',thisvar, value = 0):
invalid factor level, NA generated

Second Approach: I tried for one of the factor variables

library(car)
df$some_factor_var <- recode(df$some_factor_var, "NA = 0")

But this replaced every value within "some_factor_var" as 0

Third Approach : I tried again for one of the factor variables

library(forcats)
fct_explicit_na(df$some_factor_var,0)

Error: Can't convert a double vector to a character vector

Can someone please show me how to fix this problem? Is there a way to replace ALL empty/missing/NA values for all variables at once?

Thanks

Upvotes: 1

Views: 1004

Answers (2)

Nicol&#225;s Velasquez
Nicol&#225;s Velasquez

Reputation: 5898

With tidyverse, try:

library(tidyverse)

df <- 
  tibble(var_numeric = c(1,2,3,NA),
         var_factor = as.factor(c(4,5,6,NA)))

df %>% 
  replace_na(list(var_numeric = 0)) %>% 
  mutate(var_factor = fct_explicit_na(var_factor, "0"))

# A tibble: 4 x 2
  var_numeric var_factor
        <dbl> <fct>     
1           1 4         
2           2 5         
3           3 6         
4           0 0   

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388982

For factor variables you need to first include the new level (0) in the data if it is not already present.

See this example -

df <- data.frame(a = factor(c(1, NA, 2, 5)), b = 1:4, 
                 c = c('a', 'b', 'c', NA), d = c(1, 2, NA, 1))

#Include 0 in the levels for "a" variable
levels(df$a) <- c(levels(df$a), 0)
#Replace NA to 0
df[is.na(df)] <- 0
df
#  a b c d
#1 1 1 a 1
#2 0 2 b 2
#3 2 3 c 0
#4 5 4 0 1

str(df)
#'data.frame':  4 obs. of  4 variables:
# $ a: Factor w/ 4 levels "1","2","5","0": 1 4 2 3
# $ b: int  1 2 3 4
# $ c: chr  "a" "b" "c" "0"
# $ d: num  1 2 0 1

Upvotes: 2

Related Questions