user3816990
user3816990

Reputation: 247

Combine a function and for loop

I have data for different tissues like so

 tissueA tissueB tissueC
gene1    4.5 6.2 5.8
gene2    3.2 4.7 6.6

And I want to calculate a summary statistic that is

x = Σ [1-log2(i,j)/log2(i,max)]/n-1

where n is the number of tissues (here it is 3), (i,max) is the highest value for the gene i across the n tissues, (i.e for gene1 it is 6.2).

Since I have to do this calculation for each tissue for every gene (as the sum goes from j to n, and j=1) and then get the sum of that

I wrote a for loop

for (i in seq_along(x) {
my.max <- max(x[,i])
my.statistic <- (1-log2(x[,i]/log2[my.max])
my.sum <- sum(my.statistic)
my.answer <- my.sum/2 #(n-1=3-1=2)

however I am not sure how to apply this for loop for each row, normally I would write a function and just do (apply,1,function(x)) but I am not sure how a for loop can be turned into a function.

For expected output for gene1, for example, it would be

(1-log2(4.5)/log2(6.2))/2 + (1-log2(5.8)/log2(6.2))/2 =0.1060983

Upvotes: 7

Views: 1081

Answers (2)

Veerendra Gadekar
Veerendra Gadekar

Reputation: 4472

Just in case if you have a huge data set, you can use plyr's adply() which is faster compared to apply()

library(plyr)
adply(df, 1, function(x) 
data.frame( my.stat = sum(1-log2((x[,x != max(x)]))/log2(max(x))) / (length(x)-1)))

#tissueA tissueB tissueC   my.stat
#1     4.5     6.2     5.8 0.1060983
#2     3.2     4.7     6.6 0.2817665

Upvotes: 6

zx8754
zx8754

Reputation: 56219

Try this:

#data
df <- read.table(text=" tissueA tissueB tissueC
gene1    4.5 6.2 5.8
                 gene2    3.2 4.7 6.6")

#result
apply(df,1,function(i){
  my.max <- max(i)
  my.statistic <- 
    (1-log2(i)/log2(my.max))
  my.sum <- sum(my.statistic)
  my.answer <- my.sum/(length(i)-1)
  my.answer
})

#result
#     gene1     gene2 
# 0.1060983 0.2817665 

Upvotes: 5

Related Questions