Reputation: 23
I am currently working on a dataframe with raw numeric data in cols. Every col contains data for one parameter (for example gene expression data of gene xyz) while each row contains a subject. Some of the data in the cols are normally distributed, while some are far from it. I ran shapiro tests using apply with margin 2 for different transformations and then picked suitable transformations by comparing shapiro.test()$p.value. I sent my pick as char to a vector, giving me a vector of NA, log10, sqrt with the length of ncol(DataFrame). I now wonder if it is possible to apply the vector to the data frame via an apply-function, or if neccessary a for-loop. How do I do this or is there a better way? I guess I could loop if-else statements but there has to be a more efficient ways because my code already is slow.
Thanks all!
Update: I tried the code below but it is giving me "Error in file(filename, "r") : invalid 'description' argument"
TransformedExampleDF <- apply(exampleDF, 2 , function(x) eval(parse(paste(transformationVector , "(" , x , ")" , sep = "" ))))
exampleDF <- as.data.frame(matrix(c(1,2,3,4,1,10,100,1000,0.1,0.2,0.3,0.4), ncol=3, nrow = 4))
transformationVector <- c(NA, "log10", NA)
Upvotes: 1
Views: 236
Reputation: 51
An alternative to the Solution provided by Dunois using mapply:
set.seed(100)
## Example functions
# Example function 1
myaddtwo <- function(x){
if(is.numeric(x)){
x = x+2
} else{
warning("Input must be numeric!")
}
return(x)
#Constraints such as the one shown above
#can be added elsewhere to prevent
#inappropriate action
}
# Example function 2
mymulttwo <- function(x){
return(x*2)
}
# Example function 3
mysqrt <- function(x){
return(sqrt(x))
}
# Example function 4
myna <- function(x){
return(NA)
}
# Ordered list of function names (specify them in order of the
# columns to be processed).
my_func_list <- c("myaddtwo", "mymulttwo", "mysqrt", "myna")
## Sample dataset
my_df <- data.frame(matrix(sample(1:100, 40, replace = TRUE),
nrow = 10, ncol = 4), stringsAsFactors = FALSE)
# mapply is a multivariate version of 'lapply'. Like 'lapply', it loops
# through values in a vector and applies a function to each value. The
# difference is that 'mapply' assigns multiple arguments to functions
# in each run. In this case, it allows us to loop through the data
# frame, gathering one function name per column.
my_df1 <- data.frame(
mapply(function(column, appliedFunction) {
get(appliedFunction)(column)
}, column = my_df, appliedFunction = my_func_list))
my_df
# X1 X2 X3 X4
# 74 7 18 91
# 89 7 25 39
# 78 55 2 16
# 23 43 51 75
# 86 82 68 66
# 70 61 68 70
# 4 12 52 93
# 55 99 48 45
# 70 51 32 30
# 98 72 85 30
my_df1
# X1 X2 X3 X4
# 76 14 4.242641 NA
# 91 14 5.000000 NA
# 80 110 1.414214 NA
# 25 86 7.141428 NA
# 88 164 8.246211 NA
# 72 122 8.246211 NA
# 6 24 7.211103 NA
# 57 198 6.928203 NA
# 72 102 5.656854 NA
# 100 144 9.219544 NA
Upvotes: 0
Reputation: 1843
So you could do something like this. In the example below, I've cooked up four random functions whose names I've then stored in the list func_list
(Note: the last function converts data to NA
; that is intentional).
Then, I created another function func_to_df()
that accepts the data.frame
and the list of functions (func_list
) as inputs, and applies (i.e., executes using get()
) the functions upon the corresponding column of the data.frame
. The output is returned (and in this example, is stored in the data.frame
my_df1
.
tl;dr: just look at what func_to_df()
does. It might also be worthwhile looking into the purrr
package (although it hasn't been used here).
#---------------------
#Example function 1
myaddtwo <- function(x){
if(is.numeric(x)){
x = x+2
} else{
warning("Input must be numeric!")
}
return(x)
#Constraints such as the one shown above
#can be added elsewhere to prevent
#inappropriate action
}
#Example function 2
mymulttwo <- function(x){
return(x*2)
}
#Example function 3
mysqrt <- function(x){
return(sqrt(x))
}
#Example function 4
myna <- function(x){
return(NA)
}
#---------------------
#Dummy data
my_df <- data.frame(
matrix(sample(1:100, 40, replace = TRUE),
nrow = 10, ncol = 4),
stringsAsFactors = FALSE)
#User somehow ascertains that
#the following order of functions
#is the right one to be applied to the data.frame
my_func_list <- c("myaddtwo", "mymulttwo", "mysqrt", "myna")
#---------------------
#A function which applies
#the functions from func_list
#to the columns of df
func_to_df <- function(df, func_list){
for(i in 1:length(func_list)){
df[, i] <- get(func_list[i])(df[, i])
#Alternative to get()
#df[, i] <- eval(as.name(func_list[i]))(df[, i])
}
return(df)
}
#---------------------
#Execution
my_df1 <- func_to_df(my_df, my_func_list)
#---------------------
#Output
my_df
# X1 X2 X3 X4
# 1 8 85 6 41
# 2 45 7 8 65
# 3 34 80 16 89
# 4 34 62 9 31
# 5 98 47 51 99
# 6 77 28 40 72
# 7 24 7 41 46
# 8 45 80 75 30
# 9 93 25 39 72
# 10 68 64 87 47
my_df1
# X1 X2 X3 X4
# 1 10 170 2.449490 NA
# 2 47 14 2.828427 NA
# 3 36 160 4.000000 NA
# 4 36 124 3.000000 NA
# 5 100 94 7.141428 NA
# 6 79 56 6.324555 NA
# 7 26 14 6.403124 NA
# 8 47 160 8.660254 NA
# 9 95 50 6.244998 NA
# 10 70 128 9.327379 NA
#---------------------
Upvotes: 1