Reputation: 965
I'm attempting to create an ID column for my data frame that counts a sequence of events and can't figure out where I'm going wrong.
The data looks like this:
data
library(tidyverse)
df <- tribble(
~group, ~value,
"a", 4,
"a", 3,
"a", 10,
"b", 2,
"b", 4,
"a", 20,
"a", 14,
"a", 12,
"a", 9,
"b", 66,
"b", 23,
"b", 48)
Things I've tried...
I tried to use cur_group_id()
but that only seems to return a binary value recognizing each group:
df %>%
group_by(group) %>%
mutate(ID = cur_group_id()) %>%
as.data.frame()
# A tibble: 12 x 3
group value expectedID
<chr> <dbl> <dbl>
1 a 4 1
2 a 3 1
3 a 10 1
4 b 2 1
5 b 4 1
6 a 20 2
7 a 14 2
8 a 12 2
9 a 9 2
10 b 66 2
11 b 23 2
12 b 48 2
I've also tried seq_along()
which gets me a bit closer to what I want, but is more a running count of the rows, like row_number()
, for each time the group has a value.
df %>%
group_by(group) %>%
mutate(ID = seq_along(group)) %>%
as.data.frame()
group value expectedID ID
1 a 4 1 1
2 a 3 1 2
3 a 10 1 3
4 b 2 1 1
5 b 4 1 2
6 a 20 2 4
7 a 14 2 5
8 a 12 2 6
9 a 9 2 7
10 b 66 2 3
11 b 23 2 4
12 b 48 2 5
My desired output
What I'd really like it to look like is this:
df$expectedID <- c(1,1,1,1,1,2,2,2,2,2,2,2)
# A tibble: 12 x 3
group value expectedID
<chr> <dbl> <dbl>
1 a 4 1
2 a 3 1
3 a 10 1
4 b 2 1
5 b 4 1
6 a 20 2
7 a 14 2
8 a 12 2
9 a 9 2
10 b 66 2
11 b 23 2
12 b 48 2
Basically, if the lagged group is the same as the current group, retain the count. If the lagged group is different than the current group, begin a new count. Each time the group changes, increase the count by one.
Upvotes: 0
Views: 57
Reputation: 33488
Here is one option, (ab)using rle()
with data.table::rowid()
:
df$id <-
rle(df$group) %>% {rep(data.table::rowid(.$values), times = .$length)}
Upvotes: 1