AntVal
AntVal

Reputation: 665

Subset data for specific values in a variable in R

I have a panel dataset in long format that looks something like this:

idpers <- c(1040, 1040, 1041, 1041, 1041, 1232, 1277, 1277, 1277, 1277)
wave <- c(2012, 2013, 2012, 2013, 2014, 2011, 2011, 2012, 2013, 2014)
df <- as.data.frame c(idpers, wave) 

where idpers is an interviewee id, and wave is an indicator of on which wave/year the survey was conducted.

I would like to test the effect of a treatment that took place in say 2013. And I want to subset my dataframe for only participants who have both pre and post treatment observations. So I just want to keep each idpers row if there are other rows for that same idpers with values for both before and after/during the 2013 wave. I tried plenty of things like this:

df.ref%>%
  group_by(idpers)%>%
  filter(wave %in% c(2011,2012,2013,2014))

But this keeps any row with wave values on there.

I hope that was clear and I'm happy to give more details! Thanks a lot!

Upvotes: 0

Views: 306

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388817

I think you are looking for :

library(dplyr)
df %>% group_by(idpers) %>% filter(any(wave < 2013) && any(wave > 2013))

#  idpers  wave
#   <dbl> <dbl>
#1   1041  2012
#2   1041  2013
#3   1041  2014
#4   1277  2011
#5   1277  2012
#6   1277  2013
#7   1277  2014

This will include idpers which will have at least one value before 2013 and one value after.

Upvotes: 3

Related Questions