Reputation: 2351
I have a data.table (pooled cross section) as follows:
library(data.table)
DT <- fread(
"ID country year Event
1 NLD 1998 0
2 NLD 1998 0
3 NLD 1998 0
4 NLD 2002 0
5 NLD 2002 0
6 NLD 2006 1
7 NLD 2006 1
8 NLD 2006 1
9 GBR 2002 0
10 GBR 2002 0
11 GBR 2002 0
12 GBR 2006 1
13 GBR 2006 1
14 GRC 2001 0
15 GRC 2007 0
16 USA 2001 1
17 USA 2001 1
18 USA 2007 0",
header = TRUE)
I want to create a subset with the following condition;
If the Event
is 1
for a combination of country
and year
, and
the Event
is 0
for the previous year
of that `country, take both rows.
I tried to do the following:
DT <- DT[Event==1 & shift(Event==0), ,by=c("country ","year")]
DT <- DT[Event==1 & shift(Event)==0,, by=c("country ","year")]
But that is not allowed.
Warning message:
In `[.data.table`(DT, Event== 1 & shift(Event) == :
Ignoring by= because j= is not supplied
What would be the right way to do this?
Desired output:
DT <- fread(
"ID country year Event
4 NLD 2002 0
5 NLD 2002 0
6 NLD 2006 1
7 NLD 2006 1
8 NLD 2006 1
9 GBR 2002 0
10 GBR 2002 0
11 GBR 2002 0
12 GBR 2006 1
13 GBR 2006 1",
header = TRUE)
Upvotes: 1
Views: 1299
Reputation: 83275
A possible solution:
DT[DT[DT[, .I[Event == 1 & shift(Event, fill = 1) == 0] + (-1:0)
, by = .(country)]$V1
, .(country, year)]
, on = .(country, year)]
which gives:
ID country year Event 1: 4 NLD 2002 0 2: 5 NLD 2002 0 3: 6 NLD 2006 1 4: 7 NLD 2006 1 5: 8 NLD 2006 1 6: 9 GBR 2002 0 7: 10 GBR 2002 0 8: 11 GBR 2002 0 9: 12 GBR 2006 1 10: 13 GBR 2006 1
Explanation:
With DT[, .I[Event == 1 & shift(Event, fill = 1) == 0] + (-1:0), by = .(country)]
we look up the rows that meet the condition. This results in the following table with rownumbers by country that meet the condition:
country V1
1: NLD 5
2: NLD 6
3: GBR 11
4: GBR 12
Next, with $V1
the rownumbers are extracted and used to subset DT
with only the columns country
and year
in:
DT[DT[, .I[Event == 1 & shift(Event, fill = 1) == 0] + (-1:0)
, by = .(country)]$V1
, .(country, year)]
which results in:
country year
1: NLD 2002
2: NLD 2006
3: GBR 2002
4: GBR 2006
Finally, this is joined with the original DT
on the country
and year
columns with on = .(country, year)
to only keep the countries for which the condition applies.
Upvotes: 1