Reputation: 693
I repeatedly encounter this type of task in different contexts in my work. I've used various approaches to address it in the past (usually some awkward combo of lag, diff, etc.), but keep thinking there must be a better, more general, more efficient way. The goal is to label groups in a new variable based on sequential changes in another variable. For example:
var1a <- c("A","A","B","B","B","C","D","D","D","D","D")
should result in a new variable labeling the four groups:
var2a <- c(1, 1, 2, 2, 2, 3, 4, 4, 4, 4, 4)
Somewhat less trivially, this should be based on the grouping of the same values in sequence, not just unique values of var1. For example:
var1b <- c(1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0)
should result in a new variable labeling the four groups:
var2b <- c(1, 1, 1, 2, 2, 3, 4, 4, 4, 4, 4, 4)
And to clarify, when I say "efficient" I'm more interested in straightforward/readable/robust/general than in computationally efficient, though that also has some importance.
Upvotes: 1
Views: 162
Reputation: 263411
And I was going to echo Steve Kern's suggestion to coerce factor to numeric, but use this for the second Q:
> cumsum(c(1, diff(var1b)!=0))
[1] 1 1 1 2 2 3 4 4 4 4 4 4
I would point out that the question was ambiguous w.r.t. what would be the desired answer tot he first Q for
var1a <- c("A","A","B","B","B","C","D","D","D","D","D", "a", "A", "B", "B")
The rle
approach will give a different answer than the factor
approach.
Upvotes: 0
Reputation: 59385
You could use run length encoding (?rle
):
var1a <- c("A","A","B","B","B","C","D","D","D","D","D")
z <- rle(var1a)
var2a <- rep(1:length(z$lengths),z$lengths)
var2a
# [1] 1 1 2 2 2 3 4 4 4 4 4
var1b <- c(1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0)
z <- rle(var1b)
var2b <- rep(1:length(z$lengths),z$lengths)
var2b
# [1] 1 1 1 2 2 3 4 4 4 4 4 4
Or, more generally,
get.groups <- function(x) with(rle(x),rep(1:length(lengths),lengths))
get.groups(var1a)
# [1] 1 1 2 2 2 3 4 4 4 4 4
get.groups(var1b)
# [1] 1 1 1 2 2 3 4 4 4 4 4 4
Upvotes: 3
Reputation: 596
To answer the first question, I try the following:
var2a <- as.integer(factor(var1a))
For the second question, I would use @jlhoward's suggestion of using rle
.
Upvotes: 0