Reputation: 1101
I have a person table of individual characteristics e.g.:
person <- data.frame(group.id = c("N","N","P"), person.id = c("A", "B", "C"), strt = c(as.Date(x = "2002-07-01"), as.Date(x = "2003-08-01"), as.Date(x = "2004-06-23")), end = c(as.Date(x = "2003-08-01"), as.Date(x = "2004-09-01"), as.Date(x = "2006-07-01")), c = 1:3, d = 3:5)
and a group table of group characteristics e.g.:
group <- data.frame(group.id = c("N", "N", "N", "O", "O", "O", "P", "P", "P"), report.date = c(as.Date(x = "2002-07-01"), as.Date(x = "2002-08-01"), as.Date(x = "2002-09-01")), a = c(1:3), b = c(4:6))
I'd like to merge them by the group.id and applicable date range e.g.:
group2 <- data.frame(group, person.id = c("A", "A", "A", NA, NA, NA, NA, NA, NA), strt = c(as.Date(x = "2002-07-01"), as.Date(x = "2002-07-01"), as.Date(x = "2002-07-01"), NA, NA, NA, NA, NA, NA), end = c(as.Date(x = "2003-08-01"), as.Date(x = "2003-08-01"), as.Date(x = "2003-08-01"), NA, NA, NA, NA, NA, NA), c = c(1, 1, 1, NA, NA, NA, NA, NA, NA), d = c(3, 3, 3, NA, NA, NA, NA, NA, NA))
group.id report.date a b person.id strt end c d 1 N 2002-07-01 1 4 A 2002-07-01 2003-08-01 1 3 2 N 2002-08-01 2 5 A 2002-07-01 2003-08-01 1 3 3 N 2002-09-01 3 6 A 2002-07-01 2003-08-01 1 3 4 O 2002-07-01 1 4 <NA> <NA> <NA> NA NA 5 O 2002-08-01 2 5 <NA> <NA> <NA> NA NA 6 O 2002-09-01 3 6 <NA> <NA> <NA> NA NA 7 P 2002-07-01 1 4 <NA> <NA> <NA> NA NA 8 P 2002-08-01 2 5 <NA> <NA> <NA> NA NA 9 P 2002-09-01 3 6 <NA> <NA> <NA> NA NA
Does anyone have a suggestion for how to do that in R?
Upvotes: 0
Views: 266
Reputation: 23200
person <- data.frame(group_id = c("N","N","P"), person_id = c("A", "B", "C"), strt = c(as.Date(x = "2002-07-01"), as.Date(x = "2003-08-01"), as.Date(x = "2004-06-23")), end = c(as.Date(x = "2003-08-01"), as.Date(x = "2004-09-01"), as.Date(x = "2006-07-01")), c = 1:3, d = 3:5)
group <- data.frame(group_id = c("N", "N", "N", "O", "O", "O", "P", "P", "P"), report_date = c(as.Date(x = "2002-07-01"), as.Date(x = "2002-08-01"), as.Date(x = "2002-09-01")), a = c(1:3), b = c(4:6))
group2 <- data.frame(group, person_id = c("A", "A", "A", NA, NA, NA, NA, NA, NA), strt = c(as.Date(x = "2002-07-01"), as.Date(x = "2002-07-01"), as.Date(x = "2002-07-01"), NA, NA, NA, NA, NA, NA), end = c(as.Date(x = "2003-08-01"), as.Date(x = "2003-08-01"), as.Date(x = "2003-08-01"), NA, NA, NA, NA, NA, NA), c = c(1, 1, 1, NA, NA, NA, NA, NA, NA), d = c(3, 3, 3, NA, NA, NA, NA, NA, NA))
library(sqldf)
sqldf("select a.*, b.* from 'group' a left join person b on a.group_id = b.group_id and (a.report_date >= b.strt and a.report_date <= b.end)")
group_id report_date a b group_id person_id strt end c d 1 N 2002-07-01 1 4 N A 2002-07-01 2003-08-01 1 3 2 N 2002-08-01 2 5 N A 2002-07-01 2003-08-01 1 3 3 N 2002-09-01 3 6 N A 2002-07-01 2003-08-01 1 3 4 O 2002-07-01 1 4 <NA> <NA> <NA> <NA> NA NA 5 O 2002-08-01 2 5 <NA> <NA> <NA> <NA> NA NA 6 O 2002-09-01 3 6 <NA> <NA> <NA> <NA> NA NA 7 P 2002-07-01 1 4 <NA> <NA> <NA> <NA> NA NA 8 P 2002-08-01 2 5 <NA> <NA> <NA> <NA> NA NA 9 P 2002-09-01 3 6 <NA> <NA> <NA> <NA> NA NA
Note that group
is a reserved word so I had to put it in single quotes to use it as a table. I also changed the .
s in the column names to _
to avoid problems, but you could've left the .
and quoted all of the column names instead.
Upvotes: 1