Reputation: 89
I'm trying to know which observations are active during specific episodes of time
My real objective is to know which Diagnosis (dx) are active during a Pregnancy period (9 months) by a patient. It has to be considered that a patient can have different number of pregnancies along her life and also a different number of diagnosis (the dx can or can't be active ).
I have tried foverlaps like here or here but weren't exactly what I was looking for. The real problem is this and it's well documented but not for R.
Here they make it work for SQL I think. So I hope it's solvable...
I also tried Non-Equi Joins like this but I'm not able to make it work the way I want...
Let's get into it:
I have one DB with patient (id) Diagnosis (dx), the time it was registered (InD_dx) and the time it ended (EndD_dx) like this:
id <- rep("a", 11)
InD_dx <- as.Date(c("2005-10-04","2005-10-06","2005-10-06", "2008-04-07", "2010-05-10", "2012-04-24", "2012-04-24", "2012-05-15", "2014-03-20", "2014-04-22", "2017-11-30"), format = "%Y-%m-%d")
EndD_dx <- as.Date(c("2020-12-31","2020-12-31","2020-12-31", "2008-11-05", "2011-01-17", "2015-07-20", "2013-01-01", "2015-07-20", "2015-12-04", "2020-12-31", "2020-12-31"), format = "%Y-%m-%d")
dx <- c("A", "B", "C", "P", "P", "D", "P", "E", "F", "G", "H")
DT = data.table(id,InD_dx,EndD_dx, dx)
DT
id InD_dx EndD_dx dx
1: a 2005-10-04 2020-12-31 A
2: a 2005-10-06 2020-12-31 B
3: a 2005-10-06 2020-12-31 C
4: a 2008-04-07 2008-11-05 P
5: a 2010-05-10 2011-01-17 P
6: a 2012-04-24 2015-07-20 D
7: a 2012-04-24 2013-01-01 P
8: a 2012-05-15 2015-07-20 E
9: a 2014-03-20 2015-12-04 F
10: a 2014-04-22 2020-12-31 G
11: a 2017-11-30 2020-12-31 H
I selected the dx of pregnancy (P) and made a DB apart. And a DB with the dx different from P like this:
Pregnancies <- copy(DT[dx== "P"])
Pregnancies
id InD_dx EndD_dx dx
1: a 2008-04-07 2008-11-05 P
2: a 2010-05-10 2011-01-17 P
3: a 2012-04-24 2013-01-01 P
Dx_Other_than_Pregnancies <- copy(DT[dx!= "P"])
Dx_Other_than_Pregnancies
id InD_dx EndD_dx dx
1: a 2005-10-04 2020-12-31 A
2: a 2005-10-06 2020-12-31 B
3: a 2005-10-06 2020-12-31 C
4: a 2012-04-24 2015-07-20 D
5: a 2012-05-15 2015-07-20 E
6: a 2014-03-20 2015-12-04 F
7: a 2014-04-22 2020-12-31 G
8: a 2017-11-30 2020-12-31 H
I was expecting to merge them and group active diagnosis by pregnancy. Thats why I attemped with foverlap:
setkey(Dx_Other_than_Pregnancies, id, InD_dx, EndD_dx)
foverlaps(Pregnancies, Dx_Other_than_Pregnancies,
by.x=c('id', 'InD_dx', 'EndD_dx'), type='within', mult= "all", nomatch = 0L)
and Non-Equi Joins:
Dx_Other_than_Pregnancies[Pregnancies, on=.(id, dx_Ini<=dxIni , EndD_dx>=EndD_dx)]
and both end up like:
id InD_dx EndD_dx dx i.InD_dx i.EndD_dx i.dx
1: a 2005-10-04 2020-12-31 A 2008-04-07 2008-11-05 P
2: a 2005-10-06 2020-12-31 B 2008-04-07 2008-11-05 P
3: a 2005-10-06 2020-12-31 C 2008-04-07 2008-11-05 P
4: a 2005-10-04 2020-12-31 A 2010-05-10 2011-01-17 P
5: a 2005-10-06 2020-12-31 B 2010-05-10 2011-01-17 P
6: a 2005-10-06 2020-12-31 C 2010-05-10 2011-01-17 P
7: a 2005-10-04 2020-12-31 A 2012-04-24 2013-01-01 P
8: a 2005-10-06 2020-12-31 B 2012-04-24 2013-01-01 P
9: a 2005-10-06 2020-12-31 C 2012-04-24 2013-01-01 P
10: a 2012-04-24 2015-07-20 D 2012-04-24 2013-01-01 P
In the case with Non Equi Joins i.InD_dx and i.EndD_dx it's not outputed and EndD_dx becomes i.EndD_dx
Desired outcome
id InD_dx EndD_dx dx i.InD_dx i.EndD_dx i.dx
1: a 2005-10-04 2020-12-31 A 2008-04-07 2008-11-05 P
2: a 2005-10-06 2020-12-31 B 2008-04-07 2008-11-05 P
3: a 2005-10-06 2020-12-31 C 2008-04-07 2008-11-05 P
4: a 2005-10-04 2020-12-31 A 2010-05-10 2011-01-17 P
5: a 2005-10-06 2020-12-31 B 2010-05-10 2011-01-17 P
6: a 2005-10-06 2020-12-31 C 2010-05-10 2011-01-17 P
7: a 2005-10-04 2020-12-31 A 2012-04-24 2013-01-01 P
8: a 2005-10-06 2020-12-31 B 2012-04-24 2013-01-01 P
9: a 2005-10-06 2020-12-31 C 2012-04-24 2013-01-01 P
10: a 2012-04-24 2015-07-20 D 2012-04-24 2013-01-01 P
11: a 2012-05-15 2015-07-20 E 2012-04-24 2013-01-01 P
I dunno if I'm complicating things a little bit too much. Probably with a join 1:multiple and making differences between dates I would obtain what I want. This way it's efficient enough?
Any help would be appreciatated
Thanks in advance!
Upvotes: 0
Views: 170
Reputation: 41210
type = 'within'
is excluding some partial overlaps you're looking for.
Try:
DT = data.table(id,InD_dx,EndD_dx, dx)
setkey(DT,id,InD_dx,EndD_dx)
foverlaps(DT[dx=='P'],DT[dx!='P'],
by.x = c("id", "InD_dx", "EndD_dx"),
by.y = c("id", "InD_dx", "EndD_dx"))
id InD_dx EndD_dx dx i.InD_dx i.EndD_dx i.dx
1: a 2005-10-04 2020-12-31 A 2008-04-07 2008-11-05 P
2: a 2005-10-06 2020-12-31 B 2008-04-07 2008-11-05 P
3: a 2005-10-06 2020-12-31 C 2008-04-07 2008-11-05 P
4: a 2005-10-04 2020-12-31 A 2010-05-10 2011-01-17 P
5: a 2005-10-06 2020-12-31 B 2010-05-10 2011-01-17 P
6: a 2005-10-06 2020-12-31 C 2010-05-10 2011-01-17 P
7: a 2005-10-04 2020-12-31 A 2012-04-24 2013-01-01 P
8: a 2005-10-06 2020-12-31 B 2012-04-24 2013-01-01 P
9: a 2005-10-06 2020-12-31 C 2012-04-24 2013-01-01 P
10: a 2012-04-24 2015-07-20 D 2012-04-24 2013-01-01 P
11: a 2012-05-15 2015-07-20 E 2012-04-24 2013-01-01 P
Upvotes: 1