Reputation: 5897
I am working with R. I have a dataset with both character and numeric variables - I am trying to replace all NA's and empty values in this data with "0".
Recently, I learned how to replace "NA" values within factor variables as 0 (R: replacing <NA> within factor variables as 0):
# "df" is the dataset, "a" is the variable
#Include 0 in the levels for "a" variable
levels(df$a) <- c(levels(df$a), 0)
#Replace NA to 0
df[is.na(df)] <- 0
Now, I am trying to learn how to apply this command on every factor variable within "df".
I learned how to identify all columns that contain "factor" variables:
is.fact <- sapply(df, is.factor)
From here, is there a way to run this command
levels(df$a) <- c(levels(df$a), 0)
for every factor variable in the data?
Currently, I was planning on manually rewriting this command for all the variables, e.g.:
levels(df$a) <- c(levels(df$a), 0)
levels(df$b) <- c(levels(df$b), 0)
levels(df$c) <- c(levels(df$c), 0)
etc
And then run the following line:
df[is.na(df)] <- 0
But I was trying to find a quicker way to do this.
Does anyone know how to do this? Can someone please show me a quicker way to solve this problem?
Thanks
Upvotes: 0
Views: 876
Reputation: 430
You can lapply
once and do the work to every column. During processing each column, you can check if the column is the Factor
or not. If it is factor vector do the conversion, if not just return the column as it is.
Here is the complete code with a sample data.frame
.
df <- data.frame(
a = factor(c(1, 2, 4, NA, 5, 6)),
b = c("a", "b", "c", NA, "e", "f"),
c = factor(c(NA, 1, 2, 3, 4, 5))
)
replaceNA <- function (df) {
result <- lapply(df, function(col) {
if (is.factor(col)) {
levels(col) <- c(levels(col), 0)
}
return (col)
})
result[is.na(result)] <- 0
return (result)
}
replaceNA(df)
Upvotes: 2
Reputation: 101199
Here is a base R option (borrow df
from @Martin Gal)
list2DF(
lapply(
df,
function(x) {
if (is.factor(x)) {
replace(`levels<-`(x, c(levels(x), 0)), is.na(x), 0)
} else {
x
}
}
)
)
which gives
a b A
1 k x d
2 u e b
3 d h o
4 y s t
5 j y u
6 t k m
7 j 0 i
8 p 0 e
9 o z d
10 0 s t
11 o a v
12 h q t
13 c d g
14 m b o
15 b d b
16 0 y j
17 w 0 h
18 n t b
19 i 0 <NA>
20 b z x
21 g 0 g
22 h d s
23 v a j
24 w 0 b
25 y 0 c
26 n 0 i
27 l j b
28 g 0 b
29 f h h
30 0 0 i
Upvotes: 2
Reputation: 16978
You could use dplyr
and tidyr
for this task:
library(dplyr)
library(tidyr)
df %>%
tibble() %>%
mutate(across(where(is.factor), ~replace_na(`levels<-`(.x, c(levels(.x), 0)), 0)))
The main idea is using dplyr
's across
function to find every column with factors
and applying your function on this. The second idea used here is the fact that
levels(df$a) <- c(levels(df$a), 0)
is the same as
`levels<-`(df$a, c(levels(df$a), 0))
The assignment is just a special kind of function we can use inside a pipe. So the code applied to
df <- structure(list(a = structure(c(9L, 16L, 3L, 19L, 8L, 15L, 8L,
14L, 13L, NA, 13L, 6L, 2L, 11L, 1L, NA, 18L, 12L, 7L, 1L, 5L,
6L, 17L, 18L, 19L, 12L, 10L, 5L, 4L, NA), .Label = c("b", "c",
"d", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "t",
"u", "v", "w", "y"), class = "factor"), b = structure(c(11L,
4L, 5L, 9L, 12L, 7L, NA, NA, 13L, 9L, 1L, 8L, 3L, 2L, 3L, 12L,
NA, 10L, NA, 13L, NA, 3L, 1L, NA, NA, NA, 6L, NA, 5L, NA), .Label = c("a",
"b", "d", "e", "h", "j", "k", "q", "s", "t", "x", "y", "z"), class = "factor"),
A = c("d", "b", "o", "t", "u", "m", "i", "e", "d", "t", "v",
"t", "g", "o", "b", "j", "h", "b", NA, "x", "g", "s", "j",
"b", "c", "i", "b", "b", "h", "i")), class = "data.frame", row.names = c(NA,
-30L))
returns
# A tibble: 30 x 3
a b A
<fct> <fct> <chr>
1 k x d
2 u e b
3 d h o
4 y s t
5 j y u
6 t k m
7 j 0 i
8 p 0 e
9 o z d
10 0 s t
11 o a v
12 h q t
13 c d g
14 m b o
15 b d b
16 0 y j
17 w 0 h
18 n t b
19 i 0 NA
20 b z x
21 g 0 g
22 h d s
23 v a j
24 w 0 b
25 y 0 c
26 n 0 i
27 l j b
28 g 0 b
29 f h h
30 0 0 i
Upvotes: 2