R: set variables type and levels in a data frame , based on list

Question

I have a data frame like this:

df <- data.frame(
  v1 = sample(c("L1","L2"),5,replace = TRUE), 
  v2 = sample(c("F1","F3"),5,replace = TRUE),
  v3 = sample(seq(1,5),5,replace = T)
)

I want (1) to set the type of the variables, based on a named list:

typs <- list("v1" = "factor", "v2" = "factor", "v3" = "numeric")

and (2) to set the levels of the factor variables:

list.levels <-  list("v1" = c("L1","L2","L3"), "v2" = c("F1","F2","F3"))

Ideally, I would like to use a generalized approach, that could be applied to data frames with a dynamic number of columns.

MrFlick · Accepted Answer

You just need to write your own function to do the cleaning. Here's one possibility

fix_data <- function(data, types=NULL, flevels=NULL) {
  if(!is.null(types) && length(types)>0) {
    data[,names(types)] <- Map(function(col, type) {
      if (type=="factor") {
        factor(data[[col]])
      } else if (type=="numeric") {
        as.numeric(data[[col]])
      } else {
        stop(paste("unsupported type:", type))
      }
    }, names(types), types)
  }
  if(!is.null(flevels) && length(flevels)>0) {
    data[,names(flevels)] <- Map(function(col, levels) {
      factor(data[[col]], levels=levels)
    }, names(flevels), flevels)
  }
  data
}

And then call it like fix_data(df, typs, list.levels). Note that it returns a new data.frame so you can either overwrite the original or save it to a new variable.

The basic idea is just to loop over the names in your list and do the proper transformation. We use Map to iterate over the names and the values in your list.

R: set variables type and levels in a data frame , based on list

Answers (1)

Related Questions