user3585829
user3585829

Reputation: 965

grouped event chain ID in tidyverse

I'm attempting to create an ID column for my data frame that counts a sequence of events and can't figure out where I'm going wrong.

The data looks like this:

data

library(tidyverse)

df <- tribble(
  ~group, ~value,
  "a", 4,
  "a", 3,
  "a", 10,
  "b", 2,
  "b", 4,
  "a", 20,
  "a", 14,
  "a", 12,
  "a", 9,
  "b", 66,
  "b", 23,
  "b", 48)

Things I've tried...

I tried to use cur_group_id() but that only seems to return a binary value recognizing each group:

df %>%
  group_by(group) %>%
  mutate(ID = cur_group_id()) %>%
  as.data.frame()

# A tibble: 12 x 3
   group value expectedID
   <chr> <dbl>      <dbl>
 1 a         4          1
 2 a         3          1
 3 a        10          1
 4 b         2          1
 5 b         4          1
 6 a        20          2
 7 a        14          2
 8 a        12          2
 9 a         9          2
10 b        66          2
11 b        23          2
12 b        48          2

I've also tried seq_along() which gets me a bit closer to what I want, but is more a running count of the rows, like row_number(), for each time the group has a value.

df %>%
  group_by(group) %>%
  mutate(ID = seq_along(group)) %>%
  as.data.frame()

   group value expectedID ID
1      a     4          1  1
2      a     3          1  2
3      a    10          1  3
4      b     2          1  1
5      b     4          1  2
6      a    20          2  4
7      a    14          2  5
8      a    12          2  6
9      a     9          2  7
10     b    66          2  3
11     b    23          2  4
12     b    48          2  5

My desired output

What I'd really like it to look like is this:

df$expectedID <- c(1,1,1,1,1,2,2,2,2,2,2,2)

# A tibble: 12 x 3
   group value expectedID
   <chr> <dbl>      <dbl>
 1 a         4          1
 2 a         3          1
 3 a        10          1
 4 b         2          1
 5 b         4          1
 6 a        20          2
 7 a        14          2
 8 a        12          2
 9 a         9          2
10 b        66          2
11 b        23          2
12 b        48          2

Basically, if the lagged group is the same as the current group, retain the count. If the lagged group is different than the current group, begin a new count. Each time the group changes, increase the count by one.

Upvotes: 0

Views: 57

Answers (1)

s_baldur
s_baldur

Reputation: 33488

Here is one option, (ab)using rle() with data.table::rowid():

df$id <- 
  rle(df$group) %>% {rep(data.table::rowid(.$values), times = .$length)}

Upvotes: 1

Related Questions