Delete observations base on specific value of variable in R

Question

I'm reposting this with more information on what I want. I'm fairly new in R. I have a database (panel) and I want to delete some observations based on certain values. Let's take the next panel as an example (derived from plm packages):

Panel <-read.dta("http://dss.princeton.edu/training/Panel101.dta")
> head(Panel)
  country year           y y_bin        x1         x2          x3   opinion op
1       A 1990  1342787840     1 0.2779036 -1.1079559  0.28255358 Str agree  1
2       A 1991 -1899660544     0 0.3206847 -0.9487200  0.49253848     Disag  0
3       A 1992   -11234363     0 0.3634657 -0.7894840  0.70252335     Disag  0
4       A 1993  2645775360     1 0.2461440 -0.8855330 -0.09439092     Disag  0
5       A 1994  3008334848     1 0.4246230 -0.7297683  0.94613063     Disag  0
6       A 1995  3229574144     1 0.4772141 -0.7232460  1.02968037 Str agree  1

I want to delete the observations for the next year following OP=1. For instance if in 1990, OP =1, I want to exclude country in 1991, 1992, 1992, etc (all the next years of the database). If OP =1 in 1996, I want to exclude country in 1997, 1998 and 1999. I want also to keep observations that have OP=0 all the time. I'm running a logit model. Therefore, I'm "comparing" OP=1 and OP=0.

PS : The dataframe may be not be a good example but in my dataframe, OP = 1 only once.

Does anyone know how I can do that?

Thanks in advance.

MKR · Accepted Answer

Given this sample data:

df <- structure(list(country = c("a", "a", "a", "a", "b", "b", "b"), 
    year = c(91, 92, 93, 94, 91, 92, 93), op = c(1, 0, 0, 1, 
    0, 0, 0)), row.names = c(NA, -7L), class = c("tbl_df", "tbl", 
"data.frame"))

# A tibble: 7 x 3
  country  year    op
      
1 a          91     1
2 a          92     0
3 a          93     0
4 a          94     1
5 b          91     0
6 b          92     0
7 b          93     0

You can subset it by first creating a separate column has_1 that tells you whether there is op == 1 in a given country in any year. Then you can filter for (op == 1 & has_1 == 1) | (op == 0 & has_1 == 0).

df %>% group_by(country) %>% 
  mutate(has_1 = max(op)) %>% 
  filter(
    (op == 1 & has_1 ==1) | (op ==0 & has_1 == 0)
  )

Which results in:

# A tibble: 5 x 4
# Groups:   country [2]
  country  year    op has_1
       
1 a          91     1     1
2 a          94     1     1
3 b          91     0     0
4 b          92     0     0
5 b          93     0     0

Delete observations base on specific value of variable in R

Answers (2)

Related Questions