Reputation: 1111
I've written an R loop and turned it into a function that takes in a dataframe, The original code and data frame are below. The goal is to repeat out this function or loop 1000 times and end up with a data frame that has 1000 columns representing the rowsums for each row.name.
MY GOAL, is data frame that looks like this
row.names rsum_s1 rsum_s2 rsum_s3 rsum_s4.....rsum_s1000
kc231 40 57 15 34
kc25498 34 39 567 23
kc087398 28 3747 25 1938
x is the original data frame and it looks like this:
row.names val2 val4 val3 val4
kc231 1.62E-08 3.29E-37 1.36E-14 0.29692426
kc25498 4.93E-01 4.93E-01 4.93E-01 0.49330053
kc087398 3.50E-01 1.18E-22 1.71E-08 0.35011743
LOOP I first wrote works to give me rsum_s as a list.
for(k in 1:length(colnames(x))) {
as.numeric(x[,k])
sample(x[,k])
x[,k]<-rank(x[,k],ties.method="min")
rsum_s<-rowSums(x)
Output of LOOP the rank sums for each row.name id in each row: rsum_s
structure(c(47, 142, 82), .Names = c("kc231", "kc25498", "kc087398"))
LOOP converted into FUNCTION
sim<-function(x) { #takes a data.frame
for(k in 1:length(colnames(x))) { #each column set as numeric
as.numeric(x[,k])
sample(x[,k]) #randomly shuffle values in each column
x[,k]<-rank(x[,k],ties.method="min") #rank each randomly shuffled columns
rsum_s<-rowSums(x) #take the sum of the rows
return(rsum_s)
}
}
Result of function is in integers instead of whole numbers.
sim(dataframe1)
kc231 kc25498 kc087398
18.24 37.47 32.350117
I'm not sure what I am doing wrong here. I need to do the loop 1000 times and append the column of rank sums for each time the loop is run to a data frame or replicate function sim 1000 times and convert all the results to a data frame that would work. So if anyone can help me in completing this task it would be great
Any help is much appreciated.
Upvotes: 1
Views: 5190
Reputation: 89097
I think this is what you meant to write:
sim <- function(x) { #takes a data.frame
for(k in 1:ncol(x)) { #each column set as numeric
x[,k] <- as.numeric(x[, k])
x[,k] <- sample(x[, k]) #randomly shuffle values in each column
x[,k] <- rank(x[, k], ties.method = "min") #rank each randomly shuffled columns
}
rsum_s <- rowSums(x) #take the sum of the rows
return(rsum_s)
}
Some of the things you did wrong:
as.numeric
and sample
had no effect unless you assign the result, but most importantlyrowSums
and return
had to be moved to the end, outside the for
loop, otherwise the function would exit after processing the first column.The code above is still not very efficient because at each iteration you are replacing the whole x
multiple times. I would recommend you look at the apply
family of functions, do something like:
sim <- function(x) {
fun <- function(z) rank(sample(as.numeric(z)), ties.method = "min")
y <- as.data.frame(lapply(x, process.one.col))
rownames(y) <- rownames(x)
rowSums(y)
}
Upvotes: 1