Reputation: 5907
I am working with the R programming language. I have a dataset with both character and numeric variables - I am trying to replace all NA's and empty values in this data with "0". For a continuous variable, the NA/empty value should be replaced with a "numeric 0". For factor variables, the NA/empty value should be replaced with a "factor 0".
In the past, I used to use a standard command for replacing all NA's with 0 (in the below code, "df" represents the data frame containing the data):
df[df == NA] <- 0
I tried the above code on my data, but I still noticed that within the factor variables, this code was not able to replace <NA>
values with 0. <NA>
's are still present.
I tried several approaches:
1st Approach:
df[is.na(df)] <- 0
But this did not work:
Warning message:
In '[<-.factor'('*tmp*',thisvar, value = 0):
invalid factor level, NA generated
Second Approach: I tried for one of the factor variables
library(car)
df$some_factor_var <- recode(df$some_factor_var, "NA = 0")
But this replaced every value within "some_factor_var" as 0
Third Approach : I tried again for one of the factor variables
library(forcats)
fct_explicit_na(df$some_factor_var,0)
Error: Can't convert a double vector to a character vector
Can someone please show me how to fix this problem? Is there a way to replace ALL empty/missing/NA values for all variables at once?
Thanks
Upvotes: 1
Views: 1004
Reputation: 5898
With tidyverse, try:
library(tidyverse)
df <-
tibble(var_numeric = c(1,2,3,NA),
var_factor = as.factor(c(4,5,6,NA)))
df %>%
replace_na(list(var_numeric = 0)) %>%
mutate(var_factor = fct_explicit_na(var_factor, "0"))
# A tibble: 4 x 2
var_numeric var_factor
<dbl> <fct>
1 1 4
2 2 5
3 3 6
4 0 0
Upvotes: 2
Reputation: 388982
For factor variables you need to first include the new level (0) in the data if it is not already present.
See this example -
df <- data.frame(a = factor(c(1, NA, 2, 5)), b = 1:4,
c = c('a', 'b', 'c', NA), d = c(1, 2, NA, 1))
#Include 0 in the levels for "a" variable
levels(df$a) <- c(levels(df$a), 0)
#Replace NA to 0
df[is.na(df)] <- 0
df
# a b c d
#1 1 1 a 1
#2 0 2 b 2
#3 2 3 c 0
#4 5 4 0 1
str(df)
#'data.frame': 4 obs. of 4 variables:
# $ a: Factor w/ 4 levels "1","2","5","0": 1 4 2 3
# $ b: int 1 2 3 4
# $ c: chr "a" "b" "c" "0"
# $ d: num 1 2 0 1
Upvotes: 2