vino88
vino88

Reputation: 163

Sequence a column based on two other columns with a restarting sequence

I'm working a dataframe similar to the one below where I want to count or have an id based on the name and year, but that restarts when name changes. I've looked at a lot of posts (1, 2, 3), but most want a non-restarting identifier/counter.

I thought group_by and seq_along would work, but seq_along only takes one variable so this doesn't work

df1 <- df %>% 
    group_by(name, year) %>%
    arrange(year) %>%
    mutate(
        exp = seq_along(c(name, year)))

I've also tried to use various data.table sequences with .GRP such as the following to no avail

df1 <- data.table(df, key="name,year")
df1 <- df1[, Year_id :=.GRP, by=key(df1)]

Here's a sample dataframe.

df <- data.frame(
  name = rep(c("A","B"), each=5), 
  year=rep(2000:2001, times=5), 
  stringsAsFactors=FALSE)

Desired output:

       name year     Year_id
1        A 2000         1
2        A 2000         1
3        A 2000         1
4        A 2001         2
5        A 2001         2
6        B 2000         1
7        B 2000         1
8        B 2000         1
9        B 2001         2
10       B 2001         2

Upvotes: 0

Views: 1116

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269586

1) dplyr Create a factor and extract its levels:

library(dplyr)
df %>% 
   arrange(name, year) %>% 
   group_by(name) %>%
   mutate(Year_id = as.numeric(factor(year))) %>%
   ungroup()

giving:

# A tibble: 10 x 3
    name  year Year_id
   <chr> <int>   <dbl>
1      A  2000       1
2      A  2000       1
3      A  2000       1
4      A  2001       2
5      A  2001       2
6      B  2000       1
7      B  2000       1
8      B  2001       2
9      B  2001       2
10     B  2001       2

1a) The mutate could alternately be written as mutate(Year_id = match(year, unique(year))) as per @nicola's comment.

2) no packages Without package it could be written:

o <- with(df, order(name, year))
transform(df[o, ], Year_id = ave(year, name, FUN = function(x) as.numeric(factor(x))))

or using match.

Upvotes: 3

Rentrop
Rentrop

Reputation: 21497

What about

dat %>% 
  group_by(name) %>% 
  arrange(year) %>% 
  mutate(id = cumsum(c(1L, diff(year))))

Which gives:

Source: local data frame [10 x 4]
Groups: name [2]

     name  year Year_id    id
   <fctr> <int>   <int> <int>
1       A  2000       1     1
2       A  2000       1     1
3       A  2000       1     1
4       A  2001       2     2
5       A  2001       2     2
6       B  2000       1     1
7       B  2000       1     1
8       B  2000       1     1
9       B  2001       2     2
10      B  2001       2     2

Upvotes: 2

Related Questions