Reputation: 99
I have a dataset with multiple observations nested within individuals. This example dataset includes columns for id and for day of the week (dayweek, 1-7). I have observations from 3 days from each individual. So one individual might have only submitted reports for Sun/Wed/Thu (1, 4, 5), and the other might have submitted reports for Sun/Mon/Tue (1, 2, 3), as in this example:
df <- data.frame(
id = c(rep(1:2, each = 6),2),
dayweek = c(rep(c(1, 4, 5), each = 2),rep(c(1, 2, 3), each = 2), 3)
)
I want to set up a column that marks each individual's first, second, and third day, like this:
df2 <- data.frame(
id = c(rep(1:2, each = 6),2),
dayweek = c(rep(c(1, 4, 5), each = 2),rep(c(1, 2, 3), each = 2), 3),
daynum = c(rep(1:3, each = 2, times = 2), 3)
)
I tried using
df %>% group_indices(id, dayweek)
but this produces a new id for each individual-day combination. What's a good way to do this?
Thanks in advance!
Upvotes: 3
Views: 4712
Reputation: 389175
We could group_by
id
and create an unique id
for each dayweek
library(dplyr)
df %>%
group_by(id) %>%
mutate(daynum = as.integer(factor(dayweek, levels = unique(dayweek))))
# id dayweek daynum
# <dbl> <dbl> <int>
# 1 1 1 1
# 2 1 1 1
# 3 1 4 2
# 4 1 4 2
# 5 1 5 3
# 6 1 5 3
# 7 2 1 1
# 8 2 1 1
# 9 2 2 2
#10 2 2 2
#11 2 3 3
#12 2 3 3
#13 2 3 3
In base R we can use ave
for the same
with(df, ave(dayweek, id, FUN = function(x)
as.integer(factor(x, levels = unique(x)))))
#[1] 1 1 2 2 3 3 1 1 2 2 3 3 3
Upvotes: 4
Reputation: 42564
According to OP's comment, the rows are in order.
Then, here are two different approaches which also will handle the "Friday, Saturday, Sunday" case (dayweek
6, 7, 1) mentioned in the comments.
rleid()
fct_inorder()
rleid()
This uses the rleid()
function from the data.table
package:
library(dplyr)
df2 %>%
group_by(id) %>%
mutate(daynum2 = data.table::rleid(dayweek))
id dayweek daynum daynum2 <dbl> <dbl> <dbl> <int> 1 1 1 1 1 2 1 1 1 1 3 1 4 2 2 4 1 4 2 2 5 1 5 3 3 6 1 5 3 3 7 2 1 1 1 8 2 1 1 1 9 2 2 2 2 10 2 2 2 2 11 2 3 3 3 12 2 3 3 3 13 2 3 3 3 14 3 6 1 1 15 3 7 2 2 16 3 1 3 3
Note that an extended data set is used which also covers the "Friday, Saturday, Sunday" case (dayweek
6, 7, 1).
fct_inorder()
This is an enhanced version of Ronak's answer which handles also the "Friday, Saturday, Sunday" case. It uses the fct_inorder()
from the forcats
package which reorders factor levels by first appearance.
df2 %>%
group_by(id) %>%
mutate(daynum2 =
dayweek %>%
as.character() %>%
forcats::fct_inorder() %>%
as.integer()
)
The output is the same as above.
This is an extended data set which includes also the "Friday, Saturday, Sunday" case (dayweek
6, 7, 1):
df2 <- data.frame(
id = c(rep(1:2, each = 6), 2, rep(3, 3)),
dayweek = c(rep(c(1, 4, 5), each = 2),rep(c(1, 2, 3), each = 2), 3, 6, 7, 1),
daynum = c(rep(1:3, each = 2, times = 2), 3, 1:3)
)
Upvotes: 3
Reputation: 2022
dplyr
Using cumsum
and !duplicated
with dplyr
df %>%
group_by(id) %>%
mutate(daynum = cumsum(!duplicated(dayweek)))
# A tibble: 13 x 3
# Groups: id [2]
id dayweek daynum
<dbl> <dbl> <int>
1 1 1 1
2 1 1 1
3 1 4 2
4 1 4 2
5 1 5 3
6 1 5 3
7 2 1 1
8 2 1 1
9 2 2 2
10 2 2 2
11 2 3 3
12 2 3 3
13 2 3 3
tapply
from base R
unlist(tapply(df$dayweek, df$id, function(x) cumsum(!duplicated(x))))
1 1 2 2 3 3 1 1 2 2 3 3 3
Upvotes: 6