Aziggy
Aziggy

Reputation: 99

How to create group indices for nested groups in r

I have a dataset with multiple observations nested within individuals. This example dataset includes columns for id and for day of the week (dayweek, 1-7). I have observations from 3 days from each individual. So one individual might have only submitted reports for Sun/Wed/Thu (1, 4, 5), and the other might have submitted reports for Sun/Mon/Tue (1, 2, 3), as in this example:

df <- data.frame(
  id = c(rep(1:2, each = 6),2),
  dayweek = c(rep(c(1, 4, 5), each = 2),rep(c(1, 2, 3), each = 2), 3)
)

I want to set up a column that marks each individual's first, second, and third day, like this:

df2 <- data.frame(
  id = c(rep(1:2, each = 6),2),
  dayweek = c(rep(c(1, 4, 5), each = 2),rep(c(1, 2, 3), each = 2), 3),
  daynum = c(rep(1:3, each = 2, times = 2), 3)
)

I tried using

df %>% group_indices(id, dayweek) 

but this produces a new id for each individual-day combination. What's a good way to do this?

Thanks in advance!

Upvotes: 3

Views: 4712

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 389175

We could group_by id and create an unique id for each dayweek

library(dplyr)

df %>%
  group_by(id) %>%
  mutate(daynum = as.integer(factor(dayweek, levels = unique(dayweek))))

#      id dayweek daynum
#   <dbl>   <dbl>  <int>
# 1     1       1      1
# 2     1       1      1
# 3     1       4      2
# 4     1       4      2
# 5     1       5      3
# 6     1       5      3
# 7     2       1      1
# 8     2       1      1
# 9     2       2      2
#10     2       2      2
#11     2       3      3
#12     2       3      3
#13     2       3      3

In base R we can use ave for the same

with(df, ave(dayweek, id, FUN = function(x) 
         as.integer(factor(x, levels = unique(x)))))
#[1] 1 1 2 2 3 3 1 1 2 2 3 3 3

Upvotes: 4

Uwe
Uwe

Reputation: 42564

According to OP's comment, the rows are in order.

Then, here are two different approaches which also will handle the "Friday, Saturday, Sunday" case (dayweek 6, 7, 1) mentioned in the comments.

  1. rleid()
  2. fct_inorder()

rleid()

This uses the rleid() function from the data.table package:

library(dplyr)
df2 %>% 
  group_by(id) %>% 
  mutate(daynum2 = data.table::rleid(dayweek)) 
      id dayweek daynum daynum2
   <dbl>   <dbl>  <dbl>   <int>
 1     1       1      1       1
 2     1       1      1       1
 3     1       4      2       2
 4     1       4      2       2
 5     1       5      3       3
 6     1       5      3       3
 7     2       1      1       1
 8     2       1      1       1
 9     2       2      2       2
10     2       2      2       2
11     2       3      3       3
12     2       3      3       3
13     2       3      3       3
14     3       6      1       1
15     3       7      2       2
16     3       1      3       3

Note that an extended data set is used which also covers the "Friday, Saturday, Sunday" case (dayweek 6, 7, 1).

fct_inorder()

This is an enhanced version of Ronak's answer which handles also the "Friday, Saturday, Sunday" case. It uses the fct_inorder() from the forcats package which reorders factor levels by first appearance.

df2 %>% 
  group_by(id) %>% 
  mutate(daynum2 = 
           dayweek %>% 
           as.character() %>% 
           forcats::fct_inorder() %>% 
           as.integer()
         ) 

The output is the same as above.

Data

This is an extended data set which includes also the "Friday, Saturday, Sunday" case (dayweek 6, 7, 1):

df2 <- data.frame(
  id = c(rep(1:2, each = 6), 2, rep(3, 3)),
  dayweek = c(rep(c(1, 4, 5), each = 2),rep(c(1, 2, 3), each = 2), 3, 6, 7, 1),
  daynum = c(rep(1:3, each = 2, times = 2), 3, 1:3)
)

Upvotes: 3

nsinghphd
nsinghphd

Reputation: 2022

dplyr

Using cumsum and !duplicated with dplyr

df %>%
  group_by(id) %>%
  mutate(daynum = cumsum(!duplicated(dayweek)))


# A tibble: 13 x 3
# Groups:   id [2]
      id dayweek daynum
   <dbl>   <dbl>  <int>
 1     1       1      1
 2     1       1      1
 3     1       4      2
 4     1       4      2
 5     1       5      3
 6     1       5      3
 7     2       1      1
 8     2       1      1
 9     2       2      2
10     2       2      2
11     2       3      3
12     2       3      3
13     2       3      3

tapply from base R

unlist(tapply(df$dayweek, df$id, function(x) cumsum(!duplicated(x))))

 1  1  2  2  3  3  1  1  2  2  3  3  3 

Upvotes: 6

Related Questions