Reputation: 15
I want to convert the NAs in my factor variables into a string "None" that will be a level in my data set.
i have tried
for ( col in 1:ncol(data)){
class(data$col) == "factor"
data$col = addNA(data$col)
levels(data$col) <- c(levels(data$col), "None")
print(summary(data))
}
And i got this error
Unknown or uninitialised column: `col`.Unknown or uninitialised column: `col`.Error: Assigned data `addNA(cdata$col)` must be compatible with existing data.
x Existing data has 1000 rows.
x Assigned data has 0 rows.
i Only vectors of size 1 are recycled.
What is the problem in this way? What is the better way to do this for all factor columns at once rather that doing each column alone.
Upvotes: 1
Views: 74
Reputation: 79184
Here is an alternative way:
Here is an example with a mock dataset:
# identify which is factor column
x <- sapply(df, is.factor)
df[, x] <- lapply(df[, x], function(.){
levels(.) <- c(levels(.), "None")
replace(., is.na(.), "None")
})
output:
a b c
<fct> <fct> <dbl>
1 1 None 2
2 None 3 NA
3 4 None NA
data:
df <- structure(list(a = structure(c(1L, NA, 2L), .Label = c("1", "4"
), class = "factor"), b = structure(c(NA, 1L, NA), .Label = "3", class = "factor"),
c = c(2, NA, NA)), row.names = c(NA, -3L), class = c("tbl_df",
"tbl", "data.frame"))
Upvotes: 1
Reputation: 887651
We can loop across
the columns that are factor
, convert the NA
to "None" using fct_explicit_na
from forcats
library(dplyr)
library(forcats)
data <- data %>%
mutate(across(where(is.factor), ~ fct_explicit_na(., na_level = "None")))
In the for
loop, there are multiple issues
class(data$col) == "factor"
is checked, but it should be inside an if(...)
expressiondata$col
- is wrong as there are no column names with col
as name, instead it would be data[[col]]
summary(data)
can be checked outside the for
loopfor (col in seq_along(data)){
if(class(data[[col]]) == "factor") {
data[[col]] = addNA(data[[col]])
levels(data[[col]]) <- c(levels(data[[col]]), "None")
}
}
print(summary(data))
Upvotes: 1