Reputation: 7464
Imagine series of numbers like
c(21,22,23,30,31,32,34,35,36,37,38,50,NA,52)
where subseries are defined as: x[t]
is a part of some subserie if x[t] = x[t-1] + 1
?
So in the example above we have the following series:
c(21,22,23,30,31,32,34,35,36,37,38,50,NA,52)
## 1 1 1 2 2 2 3 3 3 3 3 4 - 5 # serie ID
## 3 | 3 | 5 | 1 | | 1 # length
What would be the most efficient way of tagging the subseries and counting their lengths (as a single function or two separate ones)?
Upvotes: 1
Views: 63
Reputation: 7464
I'm accepting the answer by akrun (with contribution by David Arenburg), but for the reference I provide a Rcpp solution I created in the meantime.
NumericVector cpp_seriesLengths(NumericVector x) {
int n = x.length();
if (n == 1)
return wrap(1);
NumericVector out(n);
int tmpCount = 1;
int prevStart = 0;
for (int i = 0; i < (n-1); i++) {
if ( x[i] == (x[i+1] - 1) ) {
tmpCount += 1;
} else {
for (int j = prevStart; j <= i; j++)
out[j] = tmpCount;
tmpCount = 1;
prevStart = i+1;
}
}
for (int j = prevStart; j < n; j++)
out[j] = tmpCount;
return out;
}
Upvotes: 1
Reputation: 887601
We can get the difference between the adjacent elements, check whether it is equal to 1, get the cumulative sum, and use that as group to get the length of the vector
unname(tapply(v1, cumsum(c(TRUE, diff(replace(v1, is.na(v1), 0))!=1)), length))
#[1] 3 3 5 1 1 1
If we need the NA
elements as ""
unname(tapply(v1, cumsum(c(TRUE, diff(replace(v1, is.na(v1), 0))!=1)),
function(x) if(all(is.na(x))) "" else length(x)))
#[1] "3" "3" "5" "1" "" "1"
Or a variation posted by @DavidArenburg with rle
rle(cumsum(c(TRUE, diff(replace(v1, is.na(v1), 0))!=1)))$lengths
Upvotes: 3