Reputation: 461
I have a data frame like this:
df <- data.frame(
v1 = sample(c("L1","L2"),5,replace = TRUE),
v2 = sample(c("F1","F3"),5,replace = TRUE),
v3 = sample(seq(1,5),5,replace = T)
)
I want (1) to set the type of the variables, based on a named list:
typs <- list("v1" = "factor", "v2" = "factor", "v3" = "numeric")
and (2) to set the levels of the factor variables:
list.levels <- list("v1" = c("L1","L2","L3"), "v2" = c("F1","F2","F3"))
Ideally, I would like to use a generalized approach, that could be applied to data frames with a dynamic number of columns.
Upvotes: 1
Views: 37
Reputation: 206167
You just need to write your own function to do the cleaning. Here's one possibility
fix_data <- function(data, types=NULL, flevels=NULL) {
if(!is.null(types) && length(types)>0) {
data[,names(types)] <- Map(function(col, type) {
if (type=="factor") {
factor(data[[col]])
} else if (type=="numeric") {
as.numeric(data[[col]])
} else {
stop(paste("unsupported type:", type))
}
}, names(types), types)
}
if(!is.null(flevels) && length(flevels)>0) {
data[,names(flevels)] <- Map(function(col, levels) {
factor(data[[col]], levels=levels)
}, names(flevels), flevels)
}
data
}
And then call it like fix_data(df, typs, list.levels)
. Note that it returns a new data.frame so you can either overwrite the original or save it to a new variable.
The basic idea is just to loop over the names in your list and do the proper transformation. We use Map
to iterate over the names and the values in your list.
Upvotes: 3