Reputation: 121
I have two data frmaes (say DF1 and DF2). I want to merge them based on multiple criteria. If 'state' and 'city' of DF1 match that of DF2, and 'date' of DF2 is in within four years of 'date' of DF1, then I would like to add column 'margin' from DF2 to DF1. If the conditions are not met, column 'margin' of DF1 will have value of NA.
DF1 <- structure(list(date = c("2001-02-14", "2001-06-14", "2004-03-31",
"2003-03-11", "2003-06-29"), state = c("DE", "NY", "NY", "NY",
"AZ"), city = c("Wilmington", "New York", "Buffalo", "New York",
"Phoenix"), industry = c("Retail", "Computers and Software",
"Manufacturing (Misc.)", "Healthcare and Medical", "Construction and Supplies"
), SIC = c(5331, 3571, 2541, 8063, 2421)), row.names = c(2937L,
2817L, 2117L, 2298L, 2228L), class = "data.frame")
DF2 <- structure(list(date = c("2000-11-07", "2000-11-07", "2008-11-04",
"2000-11-07", "2000-11-07", "2008-11-04", "2004-11-02", "2004-11-02",
"2008-11-04", "2012-11-06"), state = c("MA", "NY", "OH", "VA",
"CA", "DE", "NY", "NY", "NY", "AZ"), city = c("Boston", "New York",
"Cleveland", "Richmond", "Los Angeles", "Wilmington", "New York",
"Buffalo", "New York", "Phoenix"), margin = c(-3.61895488477766, -41.5805022156573, -40.2049010106604,
24.8839947364776, 17.2042747593408, -55.4514285714286, -35.5094126201826,
-61.9743406985032, -39.9718177548145, 7.47655435915248)), row.names = c(9849L,
10041L, 29268L, 11941L, 7365L, 31116L, 13227L, 17397L, 23352L,
32571L), class = "data.frame")
Upvotes: 0
Views: 60
Reputation: 43
Something like this? Depending on how you want the interval.
library(lubridate)
library(fuzzyjoin)
DF1$date <- ymd(DF1$date)
DF2$date <- ymd(DF2$date)
DF2$interval <- interval(DF2$date, DF2$date + years(4))
fuzzy_left_join(DF1, DF2,
by = c("city" = "city",
"state" = "state",
"date" = "interval"),
match_fun = c(`==`, `==`, `%within%`))
Upvotes: 1