Reputation: 1413
suppose I want to log transform columns in a data frame, say iris
data and create new ones with the suffix _log
dynamically for each desired column.
What I am trying to achieve is:
df$Sepal.Length_log <- log (df$Sepal.Length)
df$Sepal.Width_log <- log (df$Sepal.Width)
df$Petal.Length_log <- log (df$Petal.Length)
df$Sepal.Width_log <- log (df$Sepal.Width)
but this would be a tedious task when your data have many columns to transform, so I want to achieve this dynamically using a loop and mutate
function of the dplyr
package, my unsuccessful naive trial was:
library (dplyr)
data(iris)
varLabel <- c('Sepal.Length','Sepal.Width','Petal.Length','Petal.Width')
for (i in 1:length (varLabel)) {
varNew <- paste (varLabel[i],'log',sep='_')
iris <- dplyr::mutate (iris,varNew=log (varLabel[i])) # problem arises here
}
I get this error: Error: non-numeric argument to mathematical function
I searched for a solution and the most relevant one seems to be this tutorial on standard and non-standard evaluation, this post and that one also, but I couldn't figure out how to borrow a solution from there. Any help would be much appreciated.
Note:
I want to have both old and new columns in the data set.
Upvotes: 2
Views: 1161
Reputation: 887048
We can use mutate_each
nm1 <- paste0("varNew_", varLabel)
res <- iris %>%
mutate_each_(funs(log(.)), varLabel) %>%
setNames(., c(nm1, setdiff(names(.), varLabel))) %>%
bind_cols(iris[intersect(names(iris), varLabel)], .)
head(res,2)
#Source: local data frame [2 x 9]
# Sepal.Length Sepal.Width Petal.Length Petal.Width varNew_Sepal.Length varNew_Sepal.Width varNew_Petal.Length varNew_Petal.Width Species
# (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (fctr)
#1 5.1 3.5 1.4 0.2 1.629241 1.252763 0.3364722 -1.609438 setosa
#2 4.9 3.0 1.4 0.2 1.589235 1.098612 0.3364722 -1.609438 setosa
If the OP is looking for a base R
solution, this could also works
iris[nm1] <- log(iris[varLabel])
head(iris,2)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species varNew_Sepal.Length
#1 5.1 3.5 1.4 0.2 setosa 1.629241
#2 4.9 3.0 1.4 0.2 setosa 1.589235
# varNew_Sepal.Width varNew_Petal.Length varNew_Petal.Width
#1 1.252763 0.3364722 -1.609438
#2 1.098612 0.3364722 -1.609438
Upvotes: 2
Reputation: 6542
A solution with data.table
:
library(data.table)
data(iris)
DT <- as.data.table(iris)
varLabel <- c('Sepal.Length','Sepal.Width','Petal.Length','Petal.Width')
NewColumn <- paste0(varLabel, "_log")
DT[, (NewColumn) := lapply(.SD, log), .SDcols = varLabel]
DT
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1: 5.1 3.5 1.4 0.2 setosa
#> 2: 4.9 3.0 1.4 0.2 setosa
#> 3: 4.7 3.2 1.3 0.2 setosa
#> 4: 4.6 3.1 1.5 0.2 setosa
#> 5: 5.0 3.6 1.4 0.2 setosa
#> ---
#> 146: 6.7 3.0 5.2 2.3 virginica
#> 147: 6.3 2.5 5.0 1.9 virginica
#> 148: 6.5 3.0 5.2 2.0 virginica
#> 149: 6.2 3.4 5.4 2.3 virginica
#> 150: 5.9 3.0 5.1 1.8 virginica
#> Sepal.Length_log Sepal.Width_log Petal.Length_log Petal.Width_log
#> 1: 1.629241 1.2527630 0.3364722 -1.6094379
#> 2: 1.589235 1.0986123 0.3364722 -1.6094379
#> 3: 1.547563 1.1631508 0.2623643 -1.6094379
#> 4: 1.526056 1.1314021 0.4054651 -1.6094379
#> 5: 1.609438 1.2809338 0.3364722 -1.6094379
#> ---
#> 146: 1.902108 1.0986123 1.6486586 0.8329091
#> 147: 1.840550 0.9162907 1.6094379 0.6418539
#> 148: 1.871802 1.0986123 1.6486586 0.6931472
#> 149: 1.824549 1.2237754 1.6863990 0.8329091
#> 150: 1.774952 1.0986123 1.6292405 0.5877867
A short solution with dplyr
and mutate_each
. Just use a named vector to keep all variables
library(dplyr)
data(iris)
varLabel <- c('Sepal.Length','Sepal.Width','Petal.Length','Petal.Width')
names(varLabel) <- paste0(varLabel,'_log')
res <- iris %>% mutate_each_(funs(log(.)), vars = varLabel)
head(res)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3.0 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5.0 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> Sepal.Length_log Sepal.Width_log Petal.Length_log Petal.Width_log
#> 1 1.629241 1.252763 0.3364722 -1.6094379
#> 2 1.589235 1.098612 0.3364722 -1.6094379
#> 3 1.547563 1.163151 0.2623643 -1.6094379
#> 4 1.526056 1.131402 0.4054651 -1.6094379
#> 5 1.609438 1.280934 0.3364722 -1.6094379
#> 6 1.686399 1.360977 0.5306283 -0.9162907
Upvotes: 4
Reputation: 1942
Try this
logiris<-data.frame(lapply(varLabel,function(x){log(iris[,x])}))
names(logiris)<-paste0("Log-",varLabel)
iris<-cbind(iris,logiris)
Upvotes: 4