Reputation: 2240
Considering that I have two vectors, one called residues
and a second one called scores
, which have 31 scores, one for each residue, all positive numbers. To illustrate, the two vectors were obtained as shown below:
residues <- 1:31
scores <- runif(n = 31, min = 0.35, max = 3.54)
I am considering a random sequence just to exemplify.
If I plot the residues
x the scores
I will have the following graphic:
What I want to do is the following: I will consider specific combinations of 15 residues (henceforth referred as 15mer), skipping one residue (i.e. 1:15, 2:16, 3:17 all the way up to 17:31) and I want to calculate the area under the curve (AUC) for all these 17 combinations. My final goal is to select the 15mer that has the highest AUC.
The AUC can be calculated using the rollmean function from the zoo package, as shown in this question. However, as I have, in this example, 17 possible combinations, I am trying to find a script to automatize the process. Thanks in advance.
Upvotes: 1
Views: 271
Reputation: 5951
library(zoo)
set.seed(555)
residues <- 1:31
scores <- runif(n = 31, min = 0.35, max = 3.54)
which.max(sapply(1:17, function(x){sum(diff(residues[x:(x+14)])*rollmean(scores[x:(x+14)],2))}))
# result 7 i.e. 7:21
or
sapply(1:17, function(x){sum(diff(residues[x:(x+14)])*rollmean(scores[x:(x+14)],2))}) # gives you the AUCs
# result [1] 28.52530 29.10203 28.52847 27.65325 27.19925 28.77782 29.29373 28.13133 28.23705 27.68724 25.75294 25.27226 25.44963 25.81201 25.49907 23.48632
#[17] 22.45763
or with a custom function
f_AUC <- function(x, y, lngth){
sapply(1:(length(x)-lngth+1), function(z) sum(diff(x[z:(z+lngth-1)])*rollmean(y[z:(z+lngth-1)],2)))
}
f_AUC(x=residues, y=scores, lngth=15)
Upvotes: 2
Reputation: 3185
Here is the following function i have used
scores <- runif(n = 31, min = 0.35, max = 3.54)
fun <- function(dat, n) {
require(zoo)
N <- which(max(rollmean(dat, n)) == rollmean(dat, n))
output <- matrix(0, length(N), n)
for (i in 1:length(N)) {
output[i, ] <- dat[N[i]:(N[i] + n - 1)]
}
output
}
fun(scores, 15)
Lets run though it inside out
rollmean(dat, n)
from the zoo package as you mentioned gives us the rolling mean of which we
max(rollmean(dat, n))
finds the maximum of the rolling mean
max(rollmean(dat, n)) == rollmean(dat, n)
returns a TRUE/FALSE vector of which of the rolling means are equal to the max
N <- which(max(rollmean(dat, n)) == rollmean(dat, n))
returns the indices of the maximums. Depending on your data you might have multiple sequences that obtain the max we decide to return all of them with the following loop
for (i in 1:length(N)) {
output[i, ] <- dat[N[i]:(N[i] + n -1)]
}
to the result:
set.seed(12345)
scores <- runif(n = 31, min = 0.35, max = 3.54)
fun(scores, 15)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1.588179 1.633928 0.9208938 3.385791 1.797393 1.39234
[,7] [,8] [,9] [,10] [,11] [,12]
[1,] 3.429675 2.606867 2.406091 1.593553 2.578354 2.085545
[,13] [,14] [,15]
[1,] 1.07243 1.895739 2.879693
fun(rpois(1000, 1), 10)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 1 4 2 1 1 3 3 2 2
[2,] 1 4 2 1 1 3 3 2 2 1
[3,] 4 2 1 1 3 3 2 2 1 1
Upvotes: 0