Reputation: 665
I have a panel dataset in long format that looks something like this:
idpers <- c(1040, 1040, 1041, 1041, 1041, 1232, 1277, 1277, 1277, 1277)
wave <- c(2012, 2013, 2012, 2013, 2014, 2011, 2011, 2012, 2013, 2014)
df <- as.data.frame c(idpers, wave)
where idpers is an interviewee id, and wave is an indicator of on which wave/year the survey was conducted.
I would like to test the effect of a treatment that took place in say 2013. And I want to subset my dataframe for only participants who have both pre and post treatment observations. So I just want to keep each idpers row if there are other rows for that same idpers with values for both before and after/during the 2013 wave. I tried plenty of things like this:
df.ref%>%
group_by(idpers)%>%
filter(wave %in% c(2011,2012,2013,2014))
But this keeps any row with wave values on there.
I hope that was clear and I'm happy to give more details! Thanks a lot!
Upvotes: 0
Views: 306
Reputation: 388817
I think you are looking for :
library(dplyr)
df %>% group_by(idpers) %>% filter(any(wave < 2013) && any(wave > 2013))
# idpers wave
# <dbl> <dbl>
#1 1041 2012
#2 1041 2013
#3 1041 2014
#4 1277 2011
#5 1277 2012
#6 1277 2013
#7 1277 2014
This will include idpers
which will have at least one value before 2013 and one value after.
Upvotes: 3