Reputation: 187
I have a dataframe with 3 columns. The id of each individual, the number of group they belong (gr) and location codes (loc). What I am trying to do is identify which individuals visit 2 locations with the following sequence: Location 1 -> Location 2 -> Location 1.
Dummy dataset:
id <- c(1,1,1,1,1,1,1,2,2,2,2,2,4,4,4,4,4,4,4,4)
gr<-c(1,1,1,1,1,1,1,1,1,1,1,1,1,4,4,4,4,4,4,4)
loc <- c(5,5,4,4,5,5,5,3,3,3,3,2,2,2,2,3,3,2,2,2)
df<- data.frame(id,gr, loc)
I have tried using a diff
function, to identify differences between the locations:
dif<- diff(as.numeric(df$loc))
But I can't find any other way to move forward. In addition this approach doesn't account for the groups of each individual (and the ids repeat between groups). I was thinking maybe using a lag
function but not sure how or if it helps at all. Any recommendations? Many thanks in advance, I'm still pretty new in R.
Desired output:
id<- c(1,4)
gr<- c(1,4)
out<- data.frame(cbind(id, gr))
Upvotes: 1
Views: 50
Reputation: 102700
A possible data.table
option
unique(
setDT(df)[
,
q := rleid(loc), .(id, gr)
][
,
.SD[uniqueN(q) == 3 & first(loc) == last(loc)], .(id, gr)
][
,
.(id, gr)
]
)
gives
id gr
1: 1 1
2: 4 4
Upvotes: 1
Reputation: 887881
May be this works
library(dplyr)
library(data.table)
df %>%
group_by(id) %>%
filter(n_distinct(rleid(loc)) >2) %>%
slice_tail(n = 1) %>%
select(-loc) %>%
ungroup
# A tibble: 2 x 2
# id gr
# <dbl> <dbl>
#1 1 1
#2 4 4
Upvotes: 1