Reputation: 23
I'm fairly new in R. I have a database (panel) and I want to delete some observations based on certain values. Let's take the next panel as an example (derived from plm packages):
Panel <-read.dta("http://dss.princeton.edu/training/Panel101.dta")
> head(Panel)
country year y y_bin x1 x2 x3 opinion op
1 A 1990 1342787840 1 0.2779036 -1.1079559 0.28255358 Str agree 1
2 A 1991 -1899660544 0 0.3206847 -0.9487200 0.49253848 Disag 0
3 A 1992 -11234363 0 0.3634657 -0.7894840 0.70252335 Disag 0
4 A 1993 2645775360 1 0.2461440 -0.8855330 -0.09439092 Disag 0
5 A 1994 3008334848 1 0.4246230 -0.7297683 0.94613063 Disag 0
6 A 1995 3229574144 1 0.4772141 -0.7232460 1.02968037 Str agree 1
I want to delete the observations for the next year when OP=1. For instance if in 1990, OP =1, I want to exclude country in 1991, 1992, 1992, etc (all the next years of the database). If OP =1 in 1996, I want to exclude country in 1997, 1998 and 1999.
PS : The dataframe may be not be a good example but in my dataframe, OP = 1 only once.
Does anyone know how I can do that?
Thanks in advance.
EDIT : I forgot to say that I want also to keep observations that have OP=0 for each year. I'm running a logit model. Therefore I'm comparing OP=1 and OP=0.
Upvotes: 0
Views: 74
Reputation: 23
Your answers were great. But actually, I forgot to precise something in the question. Your answers allow me to keep observations which had op=1. But I want also to keep those who have OP=0 for each year. I'm running a logit model. By the way those who have OP=0 will be the non adopters for instance and the OP=1 will be adopters.
Upvotes: 0
Reputation: 887431
We can use slice
library(dplyr)
Panel %>%
group_by(country) %>%
slice(seq_len(match(1, op))) %>%
ungroup
Panel <- foreign::read.dta("http://dss.princeton.edu/training/Panel101.dta")
Upvotes: 1
Reputation: 389095
I am assuming you want to remove all the rows after 1 in OP for each country
separately.
Using dplyr
with filter
:
library(dplyr)
Panel <- foreign::read.dta("http://dss.princeton.edu/training/Panel101.dta")
Panel %>%
group_by(country) %>%
filter(row_number() <= match(1, op)) %>%
ungroup
# country year y y_bin x1 x2 x3 opinion op
# <fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
# 1 A 1990 1342787840 1 0.278 -1.11 0.283 Str agree 1
# 2 B 1990 -5934699520 0 -0.0818 1.43 0.0234 Agree 1
# 3 C 1990 -1292379264 0 1.31 -1.29 0.204 Agree 1
# 4 D 1990 1883025152 1 -0.314 1.74 0.647 Disag 0
# 5 D 1991 6037768704 1 0.360 2.13 1.10 Disag 0
# 6 D 1992 10244189 1 0.0519 1.68 0.970 Str agree 1
# 7 E 1990 1342787840 1 0.453 1.73 0.597 Str disag 0
# 8 E 1991 2296009472 1 0.419 1.71 0.793 Str agree 1
# 9 F 1990 1342787840 1 -0.568 -0.347 1.26 Str agree 1
#10 G 1990 1342787840 1 0.945 -1.52 1.45 Str disag 0
#11 G 1991 -1518985728 0 1.10 -1.46 1.44 Agree 1
Or same thing with slice
:
Panel %>%
group_by(country) %>%
slice(seq_len(match(1, op))) %>%
ungroup
Upvotes: 1