Reputation: 5897

R: Apply function to only "factor" variables

I am working with R. I have a dataset with both character and numeric variables - I am trying to replace all NA's and empty values in this data with "0".

Recently, I learned how to replace "NA" values within factor variables as 0 (R: replacing <NA> within factor variables as 0):

# "df" is the dataset, "a" is the variable
#Include 0 in the levels for "a" variable
levels(df$a) <- c(levels(df$a), 0)
#Replace NA to 0
df[is.na(df)] <- 0

Now, I am trying to learn how to apply this command on every factor variable within "df".

I learned how to identify all columns that contain "factor" variables:

is.fact <- sapply(df, is.factor)

From here, is there a way to run this command

levels(df$a) <- c(levels(df$a), 0)

for every factor variable in the data?

Currently, I was planning on manually rewriting this command for all the variables, e.g.:

levels(df$a) <- c(levels(df$a), 0)
levels(df$b) <- c(levels(df$b), 0)
levels(df$c) <- c(levels(df$c), 0)

etc

And then run the following line:

df[is.na(df)] <- 0

But I was trying to find a quicker way to do this.

Does anyone know how to do this? Can someone please show me a quicker way to solve this problem?

Thanks

Upvotes: 0

Answers (3)

waterloos

Reputation: 430

You can lapply once and do the work to every column. During processing each column, you can check if the column is the Factor or not. If it is factor vector do the conversion, if not just return the column as it is.

Here is the complete code with a sample data.frame.

df <- data.frame(
    a = factor(c(1, 2, 4, NA, 5, 6)),
    b = c("a", "b", "c", NA, "e", "f"),
    c = factor(c(NA, 1, 2, 3, 4, 5))
)

replaceNA <- function (df) {
    result <- lapply(df, function(col) {
        if (is.factor(col)) {
            levels(col) <- c(levels(col), 0)
        }
        return (col)
    })
    result[is.na(result)] <- 0
    return (result)
}

replaceNA(df)

Upvotes: 2

ThomasIsCoding

Reputation: 101199

Here is a base R option (borrow df from @Martin Gal)

list2DF(
  lapply(
    df,
    function(x) {
      if (is.factor(x)) {
        replace(`levels<-`(x, c(levels(x), 0)), is.na(x), 0)
      } else {
        x
      }
    }
  )
)

which gives

   a b    A
1  k x    d
2  u e    b
3  d h    o
4  y s    t
5  j y    u
6  t k    m
7  j 0    i
8  p 0    e
9  o z    d
10 0 s    t
11 o a    v
12 h q    t
13 c d    g
14 m b    o
15 b d    b
16 0 y    j
17 w 0    h
18 n t    b
19 i 0 <NA>
20 b z    x
21 g 0    g
22 h d    s
23 v a    j
24 w 0    b
25 y 0    c
26 n 0    i
27 l j    b
28 g 0    b
29 f h    h
30 0 0    i

Upvotes: 2

Martin Gal

Reputation: 16978

You could use dplyr and tidyr for this task:

library(dplyr)
library(tidyr)

df %>% 
  tibble() %>% 
  mutate(across(where(is.factor), ~replace_na(`levels<-`(.x, c(levels(.x), 0)), 0)))

The main idea is using dplyr's across function to find every column with factors and applying your function on this. The second idea used here is the fact that

levels(df$a) <- c(levels(df$a), 0)

is the same as

`levels<-`(df$a, c(levels(df$a), 0))

The assignment is just a special kind of function we can use inside a pipe. So the code applied to

df <- structure(list(a = structure(c(9L, 16L, 3L, 19L, 8L, 15L, 8L, 
14L, 13L, NA, 13L, 6L, 2L, 11L, 1L, NA, 18L, 12L, 7L, 1L, 5L, 
6L, 17L, 18L, 19L, 12L, 10L, 5L, 4L, NA), .Label = c("b", "c", 
"d", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "t", 
"u", "v", "w", "y"), class = "factor"), b = structure(c(11L, 
4L, 5L, 9L, 12L, 7L, NA, NA, 13L, 9L, 1L, 8L, 3L, 2L, 3L, 12L, 
NA, 10L, NA, 13L, NA, 3L, 1L, NA, NA, NA, 6L, NA, 5L, NA), .Label = c("a", 
"b", "d", "e", "h", "j", "k", "q", "s", "t", "x", "y", "z"), class = "factor"), 
    A = c("d", "b", "o", "t", "u", "m", "i", "e", "d", "t", "v", 
    "t", "g", "o", "b", "j", "h", "b", NA, "x", "g", "s", "j", 
    "b", "c", "i", "b", "b", "h", "i")), class = "data.frame", row.names = c(NA, 
-30L))

returns

# A tibble: 30 x 3
   a     b     A    
   <fct> <fct> <chr>
 1 k     x     d    
 2 u     e     b    
 3 d     h     o    
 4 y     s     t    
 5 j     y     u    
 6 t     k     m    
 7 j     0     i    
 8 p     0     e    
 9 o     z     d    
10 0     s     t    
11 o     a     v    
12 h     q     t    
13 c     d     g    
14 m     b     o    
15 b     d     b    
16 0     y     j    
17 w     0     h    
18 n     t     b    
19 i     0     NA   
20 b     z     x    
21 g     0     g    
22 h     d     s    
23 v     a     j    
24 w     0     b    
25 y     0     c    
26 n     0     i    
27 l     j     b    
28 g     0     b    
29 f     h     h    
30 0     0     i

Upvotes: 2

R: Apply function to only &quot;factor&quot; variables

Answers (3)

Related Questions

R: Apply function to only "factor" variables