Reputation: 729
I was looking for some way to change class of variables in one data frame by using the reference of another data frame which has information of class for each variable.
I have a data which contains around 150 variables. All the variables are in character format. Now I want to change the class of each variable depending upon its type. For this we created a separate data frame having information of class for each of the variables. Let me explain with an sample data frame.
Consider my original data frame to be df with 5 variables -
df <- data.frame(A="a",B="1",C="111111",D="d",E="e")
Now we have another data frame "variable_info" which contains just 2 variables, one "variable_name" and another "variable_class".
variable_info <- data.frame(variable_name=c("A","B","C","D","E"),variable_class=c("character","integer","numeric","character","character"))
Now using the variable_info data frame I want to change the class for each of the variables in df so that their class is as specified in "variable_info$variable_class" linking the variable name with "variable_info$variable_name"
How can we do this for a data frame? Will it be good to do this in data.table? How can we do this in data.table?
Thank you!!
Prasad
Upvotes: 7
Views: 4719
Reputation: 666
This is an adaption of @talats answer, updated using map2 from purrr, which is more concise and easier to read and type than Map, I think, and also using the .SD syntax from data.table.
This could also make it potentially more understandable what part of the code is arbitrary function names and what isn't.
library(purrr)
library(data.table)
dtx <- data.table(A="a",B="1",C="111111",D="d",E="e")
variable_info <- data.table(variable_name=c("A","B","C","D","E"),variable_class=c("character","integer","numeric","character","character"))
map_chr(dtx, typeof)
# make sure they are in the same order
variable_info <- variable_info[match(variable_info$variable_name, names(dtx)),]
# functions to apply
funs <- sapply(paste0("as.", variable_info$variable_class), match.fun)
# apply'em
dtx[, names(dtx) := map2(.SD, funs, ~ .y(as.character(.x)), )]
map_chr(dtx, typeof)
Upvotes: 0
Reputation: 2210
An alternative approach is to use a function. This function can take any pair of dataframes, find their common columns and assign the class of the first to the columns in the second.
matchColClasses<- function(df1, df2){
# Purpose: protect joins from column type mismatches - a problem with multi-column empty df
# Input: df1 - master for class assignments, df2 - for col reclass and return.
# Output: df2 with shared columns classed to match df1
# Usage: df2 <- matchColClasses(df1, df2)
sharedColNames <- names(df1)[names(df1) %in% names(df2)]
sharedColTypes <- sapply(df1[,sharedColNames], class)
for (n in sharedColNames) {
class(df2[, n]) <- sharedColTypes[n]
}
return(df2)
}
Upvotes: 0
Reputation: 70266
You could try it like this:
Make sure both tables are in the same order:
variable_info <- variable_info[match(variable_info$variable_name, names(df)),]
Create a list of function calls:
funs <- sapply(paste0("as.", variable_info$variable_class), match.fun)
Then map them to each column:
df[] <- Map(function(dd, f) f(as.character(dd)), df, funs)
With data.table
you could do it almost the same way, except you replace the last line by:
library(data.table)
dt <- as.data.table(df) # or use setDT(df)
dt[, names(dt) := Map(function(dd, f) f(as.character(dd)), dt, funs)]
Upvotes: 6