Tom
Tom

Reputation: 2351

Subsetting on a condition for two rows at the same time

I have a data.table (pooled cross section) as follows:

library(data.table)
DT <- fread(
"ID country year Event
1   NLD   1998  0
2   NLD   1998  0
3   NLD   1998  0
4   NLD   2002  0
5   NLD   2002  0
6   NLD   2006  1
7   NLD   2006  1
8   NLD   2006  1
9   GBR   2002  0
10  GBR   2002  0
11  GBR   2002  0
12  GBR   2006  1
13  GBR   2006  1
14  GRC   2001  0
15  GRC   2007  0
16  USA   2001  1   
17  USA   2001  1   
18  USA   2007  0",
header = TRUE)

I want to create a subset with the following condition;

If the Event is 1 for a combination of country and year, and the Event is 0 for the previous year of that `country, take both rows.

I tried to do the following:

DT <- DT[Event==1 & shift(Event==0), ,by=c("country ","year")]
DT <- DT[Event==1 & shift(Event)==0,, by=c("country ","year")]

But that is not allowed.

Warning message:
In `[.data.table`(DT, Event== 1 & shift(Event) ==  :
  Ignoring by= because j= is not supplied

What would be the right way to do this?

Desired output:

DT <- fread(
"ID country year Event
4   NLD   2002  0
5   NLD   2002  0
6   NLD   2006  1
7   NLD   2006  1
8   NLD   2006  1
9   GBR   2002  0
10  GBR   2002  0
11  GBR   2002  0
12  GBR   2006  1
13  GBR   2006  1",
header = TRUE)

Upvotes: 1

Views: 1299

Answers (1)

Jaap
Jaap

Reputation: 83275

A possible solution:

DT[DT[DT[, .I[Event == 1 & shift(Event, fill = 1) == 0] + (-1:0)
         , by = .(country)]$V1
      , .(country, year)]
   , on = .(country, year)]

which gives:

    ID country year Event
 1:  4     NLD 2002     0
 2:  5     NLD 2002     0
 3:  6     NLD 2006     1
 4:  7     NLD 2006     1
 5:  8     NLD 2006     1
 6:  9     GBR 2002     0
 7: 10     GBR 2002     0
 8: 11     GBR 2002     0
 9: 12     GBR 2006     1
10: 13     GBR 2006     1

Explanation:

  • With DT[, .I[Event == 1 & shift(Event, fill = 1) == 0] + (-1:0), by = .(country)] we look up the rows that meet the condition. This results in the following table with rownumbers by country that meet the condition:

       country V1
    1:     NLD  5
    2:     NLD  6
    3:     GBR 11
    4:     GBR 12
    
  • Next, with $V1 the rownumbers are extracted and used to subset DT with only the columns country and year in:

    DT[DT[, .I[Event == 1 & shift(Event, fill = 1) == 0] + (-1:0)
          , by = .(country)]$V1
       , .(country, year)]
    

    which results in:

       country year
    1:     NLD 2002
    2:     NLD 2006
    3:     GBR 2002
    4:     GBR 2006
    
  • Finally, this is joined with the original DT on the country and year columns with on = .(country, year) to only keep the countries for which the condition applies.

Upvotes: 1

Related Questions