Cherry Lim
Cherry Lim

Reputation: 1

Formatting longitudinal data in R

I have a dataset in R as below:

id <- c(1,1,1,1,1,2,2,2,2,3,3)
time <- c(2000,2001,2002,2003,2004,2000,2001,2002,2003,2000,2001)
group <- c(0,0,1,0,0,0,1,0,1,0,0)
df_temp <- data.frame(id, time, group)

and would like to create a new variable called "n" to record the sequence by "group" and re-start every time "group" switch from 0 to 1 or 1 to 0 as below:

n <- c(1,2,1,1,2,1,1,1,1,1,2)

Please could you suggest how I could generate variable "n" using dplyr package in R? Thanks very much, in advance.

I tried:

df_temp2 <- 
   df_temp %>%
   arrange(id, time, group) %>%
   group_by(group) %>%
   mutate(n=seq_along(group))

but "n" does not return as what I expected.

Upvotes: 0

Views: 39

Answers (1)

one
one

Reputation: 3902

df_temp %>%
  group_by(id,grp=cumsum(group!=lag(group,default=TRUE)))%>%
  mutate(n=row_number())%>%
  ungroup()%>%
  select(-grp)

      id  time group     n
   <dbl> <dbl> <dbl> <int>
 1     1  2000     0     1
 2     1  2001     0     2
 3     1  2002     1     1
 4     1  2003     0     1
 5     1  2004     0     2
 6     2  2000     0     1
 7     2  2001     1     1
 8     2  2002     0     1
 9     2  2003     1     1
10     3  2000     0     1
11     3  2001     0     2

Upvotes: 1

Related Questions