Reputation: 163
I'm working a dataframe similar to the one below where I want to count or have an id based on the name and year, but that restarts when name changes. I've looked at a lot of posts (1, 2, 3), but most want a non-restarting identifier/counter.
I thought group_by and seq_along would work, but seq_along only takes one variable so this doesn't work
df1 <- df %>%
group_by(name, year) %>%
arrange(year) %>%
mutate(
exp = seq_along(c(name, year)))
I've also tried to use various data.table sequences with .GRP such as the following to no avail
df1 <- data.table(df, key="name,year")
df1 <- df1[, Year_id :=.GRP, by=key(df1)]
Here's a sample dataframe.
df <- data.frame(
name = rep(c("A","B"), each=5),
year=rep(2000:2001, times=5),
stringsAsFactors=FALSE)
Desired output:
name year Year_id
1 A 2000 1
2 A 2000 1
3 A 2000 1
4 A 2001 2
5 A 2001 2
6 B 2000 1
7 B 2000 1
8 B 2000 1
9 B 2001 2
10 B 2001 2
Upvotes: 0
Views: 1116
Reputation: 269586
1) dplyr Create a factor and extract its levels:
library(dplyr)
df %>%
arrange(name, year) %>%
group_by(name) %>%
mutate(Year_id = as.numeric(factor(year))) %>%
ungroup()
giving:
# A tibble: 10 x 3
name year Year_id
<chr> <int> <dbl>
1 A 2000 1
2 A 2000 1
3 A 2000 1
4 A 2001 2
5 A 2001 2
6 B 2000 1
7 B 2000 1
8 B 2001 2
9 B 2001 2
10 B 2001 2
1a) The mutate
could alternately be written as mutate(Year_id = match(year, unique(year)))
as per @nicola's comment.
2) no packages Without package it could be written:
o <- with(df, order(name, year))
transform(df[o, ], Year_id = ave(year, name, FUN = function(x) as.numeric(factor(x))))
or using match
.
Upvotes: 3
Reputation: 21497
What about
dat %>%
group_by(name) %>%
arrange(year) %>%
mutate(id = cumsum(c(1L, diff(year))))
Which gives:
Source: local data frame [10 x 4]
Groups: name [2]
name year Year_id id
<fctr> <int> <int> <int>
1 A 2000 1 1
2 A 2000 1 1
3 A 2000 1 1
4 A 2001 2 2
5 A 2001 2 2
6 B 2000 1 1
7 B 2000 1 1
8 B 2000 1 1
9 B 2001 2 2
10 B 2001 2 2
Upvotes: 2