Reputation: 336
I have a vector of numbers:
SampleVector <- c(2,4,7,8,9,12,14,16,17,19,23,24,25,26,27,29)
I want to find the indices of elements at the start and finish of sequences that increase by 1, but I also want the indices of elements that are not part of a sequence.
Another way of saying the same thing: I want the indices of all elements that are not inside single-step sequences.
For the SampleVector, the indices I want are:
DesiredIndices <- c(1,2,3,5,6,7,8,9,10,11,15,16)
That is, everything except the number 8 (as it is in the 7:9 sequence) and the numbers 24, 25and 26 (as they are within the 23:27 sequence.
My best attempt so far is:
SequenceStartAndEndIndices <- function(vector){
DifferenceVector <- diff(vector)
DiffRunLength <- rle(DifferenceVector)
IndicesOfSingleElements <- which(DifferenceVector > 1) + 1
IndicesOfEndOfSequences <- cumsum(DiffRunLength$lengths)[which((DiffRunLength$lengths * DiffRunLength$values) == DiffRunLength$lengths)] + 1
IndicesOfStartsOfSequences<- c(1,head(IndicesOfEndOfSequences+1,-1))
UniqueIndices <- unique(c(IndicesOfStartsOfSequences,IndicesOfEndOfSequences,IndicesOfSingleElements))
SortedIndices <- UniqueIndices[order(UniqueIndices)]
return(SortedIndices)
}
This function gives me the correct answer:
> SequenceStartAndEndIndices(vector = SampleVector)
[1] 1 2 3 5 6 7 8 9 10 11 15 16
..but it is almost impossible to follow, and it is not obvious how generally applicable it will be. Is there a better way, or maybe an existing function in a package somewhere?
As background, the purpose of this is to help parse a long vector of distance markers into something that is reasonably human readable, e.g. instead of "at kilometres: 1,8,9,10,11,13" I'll be able to provide "at kilometres: 1, 8 to 11 and 13".
Upvotes: 2
Views: 41
Reputation: 389012
You can try with tapply
in base R to create groups of consecutive numbers.
SampleVector <- c(2,4,7,8,9,12,14,16,17,19,23,24,25,26,27,29)
toString(tapply(SampleVector,
cumsum(c(TRUE, diff(SampleVector) > 1)), function(x) {
if(length(x) == 1) x else paste(x[1], x[length(x)], sep = ' to ')
}))
#[1] "2, 4, 7 to 9, 12, 14, 16 to 17, 19, 23 to 27, 29"
Upvotes: 1
Reputation: 17299
This should work because the index of a value is not included if: 1) the value is larger than the previous one by 1; 2) less than the next one by 1.
> x <- diff(SampleVector)
> seq_along(SampleVector)[!(c(0, x) == 1 & c(x, 0) == 1)]
[1] 1 2 3 5 6 7 8 9 10 11 15 16
Upvotes: 1