Reputation: 477
I am struggling to think of what logic I will need to be able to come up with a counter/index for genuine matches and non genuine matches. A simplified example of my data is as follows:
ID track
x 10
x 10
x 3
x 3
x 1
y 2
The final data frame I wish to get is as follows:
ID Track Counter
x 10 1
x 10 1
x 3 2
x 3 2
x 1 3
y 2 1
Hence whenever the ID is the same and the track is the same put a counter in thr Counter column (starting with 1), whenever the ID is the same but then the Track changes make the counter +1, etc. When a new ID comes up the counter starts from 1 again.
Any advice would be great.
Upvotes: 1
Views: 345
Reputation: 2283
@Julius' answer works if you have no repeating tracks. If you run into a situation where the track may revert to a previous value, the counter will not be incremented. If this is the case in your data and you need to increment the counter when that occurs, I would suggest using lag
from dplyr.
library(dplyr)
df %>% group_by(ID) %>% mutate(count = cumsum(track != lag(track, default = track[1]))+1)
Results with a couple more datapoints:
# A tibble: 8 x 3
# Groups: ID [2]
# ID track count
# <fct> <int> <dbl>
# 1 x 10 1
# 2 x 10 1
# 3 x 3 2
# 4 x 3 2
# 5 x 1 3
# 6 x 3 4
# 7 x 3 4
# 8 y 2 1
Upvotes: 1
Reputation: 48191
You may use
library(tidyverse)
data %>% group_by(ID) %>% mutate(Counter = cumsum(!duplicated(track)))
The trick is to use duplicated
as to indicate unseen entries and cumsum
to act as their counter. E.g.,
!duplicated(data$track[1:5])
# [1] TRUE FALSE TRUE FALSE TRUE
Upvotes: 2