CadisEtRama
CadisEtRama

Reputation: 1111

R: Replicating a loop or function multiple times and adding results to data frame for each time

I've written an R loop and turned it into a function that takes in a dataframe, The original code and data frame are below. The goal is to repeat out this function or loop 1000 times and end up with a data frame that has 1000 columns representing the rowsums for each row.name.

MY GOAL, is data frame that looks like this

row.names   rsum_s1  rsum_s2  rsum_s3  rsum_s4.....rsum_s1000 
kc231       40       57       15       34
kc25498     34       39       567      23
kc087398    28       3747     25       1938

x is the original data frame and it looks like this:

row.names   val2        val4        val3        val4
kc231       1.62E-08    3.29E-37    1.36E-14    0.29692426
kc25498     4.93E-01    4.93E-01    4.93E-01    0.49330053
kc087398    3.50E-01    1.18E-22    1.71E-08    0.35011743

LOOP I first wrote works to give me rsum_s as a list.

  for(k in 1:length(colnames(x))) {  
        as.numeric(x[,k])
        sample(x[,k])
        x[,k]<-rank(x[,k],ties.method="min")
        rsum_s<-rowSums(x)

Output of LOOP the rank sums for each row.name id in each row: rsum_s

structure(c(47, 142, 82), .Names = c("kc231", "kc25498", "kc087398"))

LOOP converted into FUNCTION

sim<-function(x) { #takes a data.frame
  for(k in 1:length(colnames(x))) {  #each column set as numeric
    as.numeric(x[,k])
    sample(x[,k])  #randomly shuffle values in each column
    x[,k]<-rank(x[,k],ties.method="min") #rank each randomly shuffled columns
    rsum_s<-rowSums(x) #take the sum of the rows
    return(rsum_s)
    }
}

Result of function is in integers instead of whole numbers.

sim(dataframe1)
kc231   kc25498 kc087398
18.24   37.47   32.350117 

I'm not sure what I am doing wrong here. I need to do the loop 1000 times and append the column of rank sums for each time the loop is run to a data frame or replicate function sim 1000 times and convert all the results to a data frame that would work. So if anyone can help me in completing this task it would be great

Any help is much appreciated.

Upvotes: 1

Views: 5190

Answers (1)

flodel
flodel

Reputation: 89097

I think this is what you meant to write:

sim <- function(x) { #takes a data.frame
  for(k in 1:ncol(x)) {  #each column set as numeric
    x[,k] <- as.numeric(x[, k])
    x[,k] <- sample(x[, k])  #randomly shuffle values in each column
    x[,k] <- rank(x[, k], ties.method = "min") #rank each randomly shuffled columns
  }
  rsum_s <- rowSums(x) #take the sum of the rows
  return(rsum_s)  
}

Some of the things you did wrong:

  1. as.numeric and sample had no effect unless you assign the result, but most importantly
  2. the rowSums and return had to be moved to the end, outside the for loop, otherwise the function would exit after processing the first column.

The code above is still not very efficient because at each iteration you are replacing the whole x multiple times. I would recommend you look at the apply family of functions, do something like:

sim <- function(x) {
    fun <- function(z) rank(sample(as.numeric(z)), ties.method = "min")
    y   <- as.data.frame(lapply(x, process.one.col))
    rownames(y) <- rownames(x)
    rowSums(y)
}

Upvotes: 1

Related Questions