Number of successive rows

Question

Consider a data.table that contains matches between id1, id2 for several years.

        id1 year      id2
1: 51557094 2003 65122111
2: 51557094 2004 65122111
3: 51557094 2005 65122111
4: 51557094 2007 65122111
5: 51557094 2008 65122111
6: 51557093 2006 65122111

For any of these matches, I want to find out the duration, together with the year the match started. If there was no data for a specific year, a match ends (and the next year, if there is data again, a new match starts).

Hence, for the sample data above, the expected output would be

        id1 year      id2 length
1: 51557094 2003 65122111      3
2: 51557094 2007 65122111      2
3: 51557093 2006 65122111      1

I accepted one of the answers because it brought me far enough, but notice that it was not correct. While it worked for the sample data, it will not for the following

> dtId
        id1 year      id2
1: 51557094 2003 65122111
2: 51557094 2004 65122111
3: 51557094 2005 65122111
4: 51557094 2007 65122111
5: 51557094 2008 65122111
6: 51557094 2006 65122112

> setkey(dtId, id1, id2, year)
> dtId[,grp := cumsum(c(1,diff(year)) > 1),by=id1]
> dtId[,list(year=year[1],length=length(year)),by=list(id1,id2,grp)]
        id1      id2 grp year length
1: 51557094 65122111   0 2003      5
2: 51557094 65122112   0 2006      1

Instead, creating the match variable grp over both id1, id2 solves the issue:

> dtId[,grp := cumsum(c(1,diff(year)) > 1),by=list(id1, id2)]
> dtId[,list(year=year[1],length=length(year)),by=list(id1,id2,grp)]
        id1      id2 grp year length
1: 51557094 65122111   0 2003      3
2: 51557094 65122112   0 2006      1
3: 51557094 65122111   1 2007      2

thelatemail · Accepted Answer

An attempt:

dat[,grp := cumsum(c(1,diff(year)) > 1),by=list(id1,id2)]
dat[,list(year=year[1],length=length(year)),by=list(id1,id2,grp)]

#        id1      id2 grp year length
#1: 51557094 65122111   0 2003      3
#2: 51557094 65122111   1 2007      2
#3: 51557093 65122111   0 2006      1

Number of successive rows

Answers (2)

Related Questions