Hakki
Hakki

Reputation: 1472

adding grouping indicator for repeating sequences

I thought this is simple thing but failed and can't find answer from anywhere.

Example data looks like this. I have nro running from 1:x and restarts at random points. I would like to create ind variable which would be 1 for first run and 2 for second...

tbl <- tibble(nro = c(rep(1:3, 1), rep(1:5, 1), rep(1:4, 1)))

End result should look like this:

tibble(nro = c(rep(1:3, 1), rep(1:5, 1), rep(1:4, 1)),
       ind = c(rep(1, 3), rep(2, 5), rep(3, 4)))

 # A tibble: 12 x 2
     nro   ind
   <int> <dbl>
 1     1     1
 2     2     1
 3     3     1
 4     1     2
 5     2     2
 6     3     2
 7     4     2
 8     5     2
 9     1     3
10     2     3
11     3     3
12     4     3

I thought I could do something with ifelse but failed miserably.

tbl %>%
  mutate(ind = ifelse(nro < lag(nro), 1 + lag(ind), 1))

I assume this needs some kind of loop.

Upvotes: 2

Views: 63

Answers (1)

loki
loki

Reputation: 10360

for sequences of the same length

You could use group_by on your nro variable and then just take the row_number():

tbl %>% 
  group_by(nro) %>% 
  mutate(ind = row_number())

# A tibble: 12 x 2
# Groups:   nro [4]
#      nro   ind
#    <int> <int>
#  1     1     1
#  2     2     1
#  3     3     1
#  4     4     1
#  5     1     2
#  6     2     2
#  7     3     2
#  8     4     2
#  9     1     3
# 10     2     3
# 11     3     3
# 12     4     3

for varying length of the sequences

inspired by docendo discimus's comment

tbl <- tibble(nro = c(rep(1:3, 1), rep(1:5, 1), rep(1:4, 1)))

tbl %>% 
  mutate(ind = cumsum(nro == 1))

However, this is limited to sequences which begin with 1, since only the TRUE values of nro == 1 are cumulated.

thus, you should consider to use this:

tbl %>% mutate(dif = nro - lag(nro)) %>% 
  mutate(dif = ifelse(is.na(dif), nro, dif)) %>% 
  mutate(ind = cumsum(dif < 0) + 1) %>% 
  select(-dif)

# A tibble: 12 x 2
#      nro   ind
#    <int> <dbl>
#  1     1     1
#  2     2     1
#  3     3     1
#  4     1     2
#  5     2     2
#  6     3     2
#  7     4     2
#  8     5     2
#  9     1     3
# 10     2     3
# 11     3     3
# 12     4     3

Upvotes: 4

Related Questions