BCArg
BCArg

Reputation: 2240

R - maximize area under curve for multiple scenarios

Considering that I have two vectors, one called residues and a second one called scores, which have 31 scores, one for each residue, all positive numbers. To illustrate, the two vectors were obtained as shown below:

residues <- 1:31
scores <- runif(n = 31, min = 0.35, max = 3.54)

I am considering a random sequence just to exemplify. If I plot the residues x the scores I will have the following graphic:

enter image description here

What I want to do is the following: I will consider specific combinations of 15 residues (henceforth referred as 15mer), skipping one residue (i.e. 1:15, 2:16, 3:17 all the way up to 17:31) and I want to calculate the area under the curve (AUC) for all these 17 combinations. My final goal is to select the 15mer that has the highest AUC.

The AUC can be calculated using the rollmean function from the zoo package, as shown in this question. However, as I have, in this example, 17 possible combinations, I am trying to find a script to automatize the process. Thanks in advance.

Upvotes: 1

Views: 271

Answers (2)

dimitris_ps
dimitris_ps

Reputation: 5951

library(zoo)

set.seed(555)
residues <- 1:31
scores <- runif(n = 31, min = 0.35, max = 3.54)


which.max(sapply(1:17, function(x){sum(diff(residues[x:(x+14)])*rollmean(scores[x:(x+14)],2))}))
# result 7 i.e. 7:21

or

sapply(1:17, function(x){sum(diff(residues[x:(x+14)])*rollmean(scores[x:(x+14)],2))}) # gives you the AUCs
# result [1] 28.52530 29.10203 28.52847 27.65325 27.19925 28.77782 29.29373 28.13133 28.23705 27.68724 25.75294 25.27226 25.44963 25.81201 25.49907 23.48632
        #[17] 22.45763

or with a custom function

f_AUC <- function(x, y, lngth){
  sapply(1:(length(x)-lngth+1), function(z) sum(diff(x[z:(z+lngth-1)])*rollmean(y[z:(z+lngth-1)],2)))
}

f_AUC(x=residues, y=scores, lngth=15)

Upvotes: 2

EmilHvitfeldt
EmilHvitfeldt

Reputation: 3185

Here is the following function i have used

scores <- runif(n = 31, min = 0.35, max = 3.54)

fun <- function(dat, n) {
  require(zoo)
  N <- which(max(rollmean(dat, n)) == rollmean(dat, n))
  output <- matrix(0, length(N), n)
  for (i in 1:length(N)) {
   output[i, ] <- dat[N[i]:(N[i] + n - 1)]
  }
  output
}

fun(scores, 15)

Lets run though it inside out

rollmean(dat, n)

from the zoo package as you mentioned gives us the rolling mean of which we

max(rollmean(dat, n))

finds the maximum of the rolling mean

max(rollmean(dat, n)) == rollmean(dat, n)

returns a TRUE/FALSE vector of which of the rolling means are equal to the max

N <- which(max(rollmean(dat, n)) == rollmean(dat, n))

returns the indices of the maximums. Depending on your data you might have multiple sequences that obtain the max we decide to return all of them with the following loop

for (i in 1:length(N)) {
  output[i, ] <- dat[N[i]:(N[i] + n -1)]
}

to the result:

set.seed(12345)
scores <- runif(n = 31, min = 0.35, max = 3.54)

fun(scores, 15)
         [,1]     [,2]      [,3]     [,4]     [,5]    [,6]
[1,] 1.588179 1.633928 0.9208938 3.385791 1.797393 1.39234
         [,7]     [,8]     [,9]    [,10]    [,11]    [,12]
[1,] 3.429675 2.606867 2.406091 1.593553 2.578354 2.085545
       [,13]    [,14]    [,15]
[1,] 1.07243 1.895739 2.879693

fun(rpois(1000, 1), 10)
    [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    1    4    2    1    1    3    3    2     2
[2,]    1    4    2    1    1    3    3    2    2     1
[3,]    4    2    1    1    3    3    2    2    1     1

Upvotes: 0

Related Questions