Reputation: 143
Is there is a faster way to make a counter index than using a loop? For each contiguous run of equal values, the index should be the same. I find the looping very slow especially when the data is so big.
For illustration, here is the input and desired output
x <- c(2, 3, 9, 2, 4, 4, 3, 4, 4, 5, 5, 5, 1)
Desired resulting counter:
c(1, 2, 3, 4, 5, 5, 6, 7, 7, 8, 8, 8, 9)
Note that non-contiguous runs have different indexes. E.g. see the desired indexes of the values 2
and 4
My inefficient code is this:
group[1]<-1
counter<-1
for (i in 2:n){
if (x[i]==x[i-1]){
group[i]<-counter
}else{
counter<-counter+1
group[1]<-counter}
}
Upvotes: 14
Views: 844
Reputation: 52239
With dplyr
, you can use consecutive_id
:
library(dplyr) #1.1.0+
consecutive_id(x)
# [1] 1 2 3 4 5 5 6 7 7 8 8 8 9
Upvotes: 1
Reputation: 26238
Above answer by Jota can be further simplified to, which will be even faster
with(rle(x), rep(1:length(lengths), lengths))
[1] 1 2 3 4 5 5 6 7 7 8 8 8 9
Upvotes: 3
Reputation: 118849
Using data.table
, which has the function rleid()
:
require(data.table) # v1.9.5+
rleid(x)
# [1] 1 2 3 4 5 5 6 7 7 8 8 8 9
Upvotes: 13
Reputation: 17611
This will work with numeric of character values:
rep(1:length(rle(x)$values), times = rle(x)$lengths)
#[1] 1 2 3 4 5 5 6 7 7 8 8 8 9
You can also be a bit more efficient by calling rle
just once (about 2x faster) and a very slight speed improvement can be made using rep.int
instead of rep
:
y <- rle(x)
rep.int(1:length(y$values), times = y$lengths)
Upvotes: 7
Reputation: 206411
If you have numeric values like this, you can use diff
and cumsum
to add up changes in values
x <- c(2,3,9,2,4,4,3,4,4,5,5,5,1)
cumsum(c(1,diff(x)!=0))
# [1] 1 2 3 4 5 5 6 7 7 8 8 8 9
Upvotes: 13