SaveTheDream
SaveTheDream

Reputation: 89

Partial or Total Overlap of two Date periods

I'm trying to know which observations are active during specific episodes of time

My real objective is to know which Diagnosis (dx) are active during a Pregnancy period (9 months) by a patient. It has to be considered that a patient can have different number of pregnancies along her life and also a different number of diagnosis (the dx can or can't be active ).

I have tried foverlaps like here or here but weren't exactly what I was looking for. The real problem is this and it's well documented but not for R. capture

Here they make it work for SQL I think. So I hope it's solvable...

I also tried Non-Equi Joins like this but I'm not able to make it work the way I want...

Let's get into it:

I have one DB with patient (id) Diagnosis (dx), the time it was registered (InD_dx) and the time it ended (EndD_dx) like this:

id <- rep("a", 11)
InD_dx <- as.Date(c("2005-10-04","2005-10-06","2005-10-06", "2008-04-07", "2010-05-10", "2012-04-24", "2012-04-24", "2012-05-15", "2014-03-20", "2014-04-22", "2017-11-30"), format = "%Y-%m-%d")
EndD_dx <- as.Date(c("2020-12-31","2020-12-31","2020-12-31", "2008-11-05", "2011-01-17", "2015-07-20", "2013-01-01", "2015-07-20", "2015-12-04", "2020-12-31", "2020-12-31"), format = "%Y-%m-%d")
dx <- c("A", "B", "C", "P", "P", "D", "P", "E", "F", "G", "H")

DT = data.table(id,InD_dx,EndD_dx, dx)

    DT
    id   InD_dx    EndD_dx   dx
 1:  a 2005-10-04 2020-12-31  A
 2:  a 2005-10-06 2020-12-31  B
 3:  a 2005-10-06 2020-12-31  C
 4:  a 2008-04-07 2008-11-05  P
 5:  a 2010-05-10 2011-01-17  P
 6:  a 2012-04-24 2015-07-20  D
 7:  a 2012-04-24 2013-01-01  P
 8:  a 2012-05-15 2015-07-20  E
 9:  a 2014-03-20 2015-12-04  F
10:  a 2014-04-22 2020-12-31  G
11:  a 2017-11-30 2020-12-31  H

I selected the dx of pregnancy (P) and made a DB apart. And a DB with the dx different from P like this:

Pregnancies <- copy(DT[dx== "P"])
Pregnancies 

    id InD_dx     EndD_dx   dx
 1: a 2008-04-07 2008-11-05 P
 2: a 2010-05-10 2011-01-17 P
 3: a 2012-04-24 2013-01-01 P

Dx_Other_than_Pregnancies <- copy(DT[dx!= "P"])
Dx_Other_than_Pregnancies

   id InD_dx     EndD_dx    dx
 1: a 2005-10-04 2020-12-31 A
 2: a 2005-10-06 2020-12-31 B
 3: a 2005-10-06 2020-12-31 C
 4: a 2012-04-24 2015-07-20 D
 5: a 2012-05-15 2015-07-20 E
 6: a 2014-03-20 2015-12-04 F
 7: a 2014-04-22 2020-12-31 G
 8: a 2017-11-30 2020-12-31 H

I was expecting to merge them and group active diagnosis by pregnancy. Thats why I attemped with foverlap:

setkey(Dx_Other_than_Pregnancies, id, InD_dx, EndD_dx)

foverlaps(Pregnancies, Dx_Other_than_Pregnancies,
          by.x=c('id', 'InD_dx', 'EndD_dx'), type='within', mult= "all", nomatch = 0L)

and Non-Equi Joins:

   Dx_Other_than_Pregnancies[Pregnancies, on=.(id, dx_Ini<=dxIni , EndD_dx>=EndD_dx)]

and both end up like:

     id     InD_dx    EndD_dx dx   i.InD_dx  i.EndD_dx i.dx
 1:  a 2005-10-04 2020-12-31  A 2008-04-07 2008-11-05    P
 2:  a 2005-10-06 2020-12-31  B 2008-04-07 2008-11-05    P
 3:  a 2005-10-06 2020-12-31  C 2008-04-07 2008-11-05    P
 4:  a 2005-10-04 2020-12-31  A 2010-05-10 2011-01-17    P
 5:  a 2005-10-06 2020-12-31  B 2010-05-10 2011-01-17    P
 6:  a 2005-10-06 2020-12-31  C 2010-05-10 2011-01-17    P
 7:  a 2005-10-04 2020-12-31  A 2012-04-24 2013-01-01    P
 8:  a 2005-10-06 2020-12-31  B 2012-04-24 2013-01-01    P
 9:  a 2005-10-06 2020-12-31  C 2012-04-24 2013-01-01    P
10:  a 2012-04-24 2015-07-20  D 2012-04-24 2013-01-01    P

In the case with Non Equi Joins i.InD_dx and i.EndD_dx it's not outputed and EndD_dx becomes i.EndD_dx

Desired outcome

    id   InD_dx    EndD_dx   dx   i.InD_dx  i.EndD_dx i.dx
 1:  a 2005-10-04 2020-12-31  A 2008-04-07 2008-11-05    P
 2:  a 2005-10-06 2020-12-31  B 2008-04-07 2008-11-05    P
 3:  a 2005-10-06 2020-12-31  C 2008-04-07 2008-11-05    P
 4:  a 2005-10-04 2020-12-31  A 2010-05-10 2011-01-17    P
 5:  a 2005-10-06 2020-12-31  B 2010-05-10 2011-01-17    P
 6:  a 2005-10-06 2020-12-31  C 2010-05-10 2011-01-17    P
 7:  a 2005-10-04 2020-12-31  A 2012-04-24 2013-01-01    P
 8:  a 2005-10-06 2020-12-31  B 2012-04-24 2013-01-01    P
 9:  a 2005-10-06 2020-12-31  C 2012-04-24 2013-01-01    P
10:  a 2012-04-24 2015-07-20  D 2012-04-24 2013-01-01    P
11:  a 2012-05-15 2015-07-20  E 2012-04-24 2013-01-01    P

I dunno if I'm complicating things a little bit too much. Probably with a join 1:multiple and making differences between dates I would obtain what I want. This way it's efficient enough?

Any help would be appreciatated

Thanks in advance!

Upvotes: 0

Views: 170

Answers (1)

Waldi
Waldi

Reputation: 41210

type = 'within' is excluding some partial overlaps you're looking for.

Try:

DT = data.table(id,InD_dx,EndD_dx, dx)

setkey(DT,id,InD_dx,EndD_dx)

foverlaps(DT[dx=='P'],DT[dx!='P'],
          by.x = c("id", "InD_dx", "EndD_dx"),
          by.y = c("id", "InD_dx", "EndD_dx"))

   id     InD_dx    EndD_dx dx   i.InD_dx  i.EndD_dx i.dx
 1:  a 2005-10-04 2020-12-31  A 2008-04-07 2008-11-05    P
 2:  a 2005-10-06 2020-12-31  B 2008-04-07 2008-11-05    P
 3:  a 2005-10-06 2020-12-31  C 2008-04-07 2008-11-05    P
 4:  a 2005-10-04 2020-12-31  A 2010-05-10 2011-01-17    P
 5:  a 2005-10-06 2020-12-31  B 2010-05-10 2011-01-17    P
 6:  a 2005-10-06 2020-12-31  C 2010-05-10 2011-01-17    P
 7:  a 2005-10-04 2020-12-31  A 2012-04-24 2013-01-01    P
 8:  a 2005-10-06 2020-12-31  B 2012-04-24 2013-01-01    P
 9:  a 2005-10-06 2020-12-31  C 2012-04-24 2013-01-01    P
10:  a 2012-04-24 2015-07-20  D 2012-04-24 2013-01-01    P
11:  a 2012-05-15 2015-07-20  E 2012-04-24 2013-01-01    P

Upvotes: 1

Related Questions