Reputation: 10629
I would like to calculate the minimum number of consecutive elements in a vector that when added (consecutively) would be less than a given value.
For example in the following vector
ev<-c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 2.7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3.27, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 370.33, 1375.4,
1394.03, 1423.8, 1360, 1269.77, 1378.8, 1350.37, 1425.97, 1423.6,
1363.4, 1369.87, 1365.5, 1294.97, 1362.27, 1117.67, 1026.97,
1077.4, 1356.83, 565.23, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 356.83,
973.5, 0, 240.43, 1232.07, 1440, 1329.67, 1096.87, 1331.37, 1305.03,
1328.03, 1246.03, 1182.3, 1054.53, 723.03, 1171.53, 1263.17,
1200.37, 1054.8, 971.4, 936.4, 968.57, 897.93, 1099.87, 876.43,
1095.47, 1132, 774.4, 1075.13, 982.57, 947.33, 1096.97, 929.83,
1246.9, 1398.2, 1063.83, 1223.73, 1174.37, 1248.5, 1171.63, 1280.57,
1183.33, 1016.23, 1082.1, 795.37, 900.83, 1159.2, 992.5, 967.3,
1440, 804.13, 418.17, 559.57, 563.87, 562.97, 1113.1, 954.87,
883.8, 1207.1, 1046.83, 995.77, 803.93, 1036.63, 946.9, 887.33,
727.97, 733.93, 979.2, 1176.8, 1241.3, 1435.6)
What is the minimum number of elements that when added consecutively (as in the order within the vector) would sum up to lets say 20000
To be more clear i need the following: Start with ev[1] and add consecutively up to 20000. Record the number of elements you had to add in order to get to 20000 as r[1]. Then start with ev[2] and add till 20000 and so on. Recored the number of elements you had to add till 20000 as r[2]. Do this for the entire length of ev. Then return the min(r)
For example
j<-c(1, 2, 3, 5, 7, 9, 2)
.
I want the minimum number of elements that when added consecutively would give lets say >20. This should be 3
(5+7+9)
Thanks a lot
Upvotes: 3
Views: 3634
Reputation: 40871
Well, I'll give it a shot: This one will find the length of the minimum sequence of numbers
that add up to or above max
. It makes no claims to be fast, but it has O(2n)
time complexity :-)
I made it return both the start index and the length.
f <- function(x, max=10) {
s <- 0
len <- Inf
start <- 1
j <- 1
for (i in seq_along(x)) {
s <- s + x[i]
while (s >= max) {
if (i-j+1 < len) {
len <- i-j+1
start <- j
}
s <- s - x[j]
j <- j + 1
}
}
list(start=start, length=len)
# uncomment the line below if you don't need the start index...
#len
}
r <- f(ev, 20000) # list(start=245, length=15)
sum(ev[seq(r$start, len=r$length)]) # 20275.42
# Test speed:
x <- sin(1:1e6)
system.time( r <- f(x, 1.9) ) # 1.54 secs
# Compile the function makes it 9x faster...
g <- compiler::cmpfun(f)
system.time( r <- g(x, 1.9) ) # 0.17 secs
Upvotes: 5
Reputation: 61973
library(zoo) # Needed for rollapply
N <- 20000 # The desired sum we want to achieve
j <- 0
for(i in 1:length(ev)){
k <- rollapply(ev, i, sum)
j[i] <- max(k)
if(j[i] >= N){
break
}
}
i # contains how many consecutive elements you need to sum (15)
j[i] # contains the corresponding sum(20275.42)
Currently this doesn't tell you where the specific subset occurs in the vector but another use of rollapply could get you that information.
There are other ways to do it but if you have a really long vector this will break out of the loop so you don't calculate more than you need. The basic idea is to use rollapply to create a vector of the consecutive sums of length k and then find the maximum of that. If this is less than what we desire do the same thing for sums of length k+1. Repeat until we find a sum that is larger than the desired threshold.
Edit:
This appears to be about 100x faster. I haven't compared it to Tommy's answer (which is probably faster than this but this will provide a significant speedup compared to my original method.
Edit 2: Moving the [-n] and removing the suppresswarnings speeds this up quite a bit.
myfun <- function(ev, N){
i <- 1
n <- length(ev)
j <- ev
repeat{
j <- (j[-n] + ev[-c(1:i)])
i <- i+1
n <- n-1
if(max(j) >= N | i > length(ev)){
break;
}
}
return(i)
}
myfun(ev, 20000)
# And stealing the idea from Tommy gives a nice speedup as well
myfuncomp <- compiler:cmpfun(myfun)
myfuncomp(ev, 20000)
myfunc3 <- compiler:cmpfun(myfun, options = list(optimize = 3))
myfunc3(ev, 20000)
library(rbenchmark) # For testing
# If you have Tommy's functions loaded as f and g you can compare
benchmark(f(ev, 20000), g(ev, 20000), myfun(ev, 20000), myfuncomp(ev, 20000), myfunc3(ev, 20000))
Upvotes: 2
Reputation: 263471
I think this may be a Traveling Salesman Problem in disguise unless you put in some more constraints. You cannot necessarily start at the max ev and go out in either direction since it may be a local non-dense maximum
x=1:length(ev)
plot(x,ev)
lxy <- loess(ev~x )
lines(predict(lxy, x=1:length(y)))
title(main="loess() fit of ev")
But in the region of the most dense values the values are fairly flat.
x=1:length(y); y=c(356.83,
973.5, 0, 240.43, 1232.07, 1440, 1329.67, 1096.87, 1331.37, 1305.03,
1328.03, 1246.03, 1182.3, 1054.53, 723.03, 1171.53, 1263.17,
1200.37, 1054.8, 971.4, 936.4, 968.57, 897.93, 1099.87, 876.43,
1095.47, 1132, 774.4, 1075.13, 982.57, 947.33, 1096.97, 929.83,
1246.9, 1398.2, 1063.83, 1223.73, 1174.37, 1248.5, 1171.63, 1280.57,
1183.33, 1016.23, 1082.1, 795.37, 900.83, 1159.2, 992.5, 967.3,
1440, 804.13, 418.17, 559.57, 563.87, 562.97, 1113.1, 954.87,
883.8, 1207.1, 1046.83, 995.77, 803.93, 1036.63, 946.9, 887.33,
727.97, 733.93, 979.2, 1176.8, 1241.3, 1435.6)
lxyhi <- loess(y~x)
plot(x,y)
lines(predict(lxyhi, x=1:length(y)))
Upvotes: 0
Reputation: 5507
you mean something like this?
> sum(ifelse(cumsum(ev)<=200000, 1, 0))
[1] 364
Upvotes: 0