Reputation: 176
I have two data sets, each containing five-digit ZIPs.
One data set looks like this:
From To Territory
7501 10000 Unassigned
10001 10463 Agent 1
10464 10464 Unassigned
10465 11769 Agent 2
And a second data set that looks like this:
zip5 address
1 10009 424 E 9TH ST APT 12, NEW YORK
2 10010 15 E 26TH ST APT 10C, NEW YORK
3 10013 310 GREENWICH ST, NEW YORK
4 10019 457 W 57TH ST, NEW YORK
I would like to write a for-loop in R that loops through the zip5
column in the second data set, then loops through both the From
and the To
columns from dataset 1, checking if the zip5
falls within the From
and To
range, and once it finds a match, assigns the Territory
value from the first dataset into a new column in second dataset.
I started to try to think through the logic but quickly became overwhelmed and thought I would turn to the StackOverflow community for guidance.
Here was my initial attempt:
for (i in nrow(df1)){
for(j in nrow(df2)){
if(df1[1, "zip5"] > df2[1, "From"] & df1[1, "zip5"] <= df2[1, "To"])
df1$newColumn = df2[j, "Territory"]
}
}
Upvotes: 0
Views: 35
Reputation: 160687
You can use data.table::foverlaps
for this:
library(data.table)
dat1 <- fread(text = '
From To Territory
7501 10000 Unassigned
10001 10463 "Agent 1"
10464 10464 Unassigned
10465 11769 "Agent 2"')
dat2 <- fread(text = '
zip5 address
10009 "424 E 9TH ST APT 12, NEW YORK"
10010 "15 E 26TH ST APT 10C, NEW YORK"
10013 "310 GREENWICH ST, NEW YORK"
10019 "457 W 57TH ST, NEW YORK"')
# if you use your own data and it is not a data.table, then do this:
setDT(dat1)
setDT(dat2)
Requirements to use foverlap
:
Both frames must have two fields, a "from" and a "to". While it might seem inane since we want to determine if "zip5" is within "From" to "To", the premise of the function is to find overlaps in two ranges. Instead of putting in special-case code to allow a single column in one frame, they chose (I'm inferring) to keep it general. This means we need to copy zip5
to another column.
Both tables need to have their ranges as "keys". If there are other columns that are keys, then the range columns must be the last two. (And in order.)
# req't 1, need a range in the second frame
dat2[, zip5copy := zip5 ]
# set keys for both
setkey(dat1, From, To)
setkey(dat2, zip5, zip5copy)
And the code:
foverlaps(dat1, dat2)
# zip5 address zip5copy From To Territory
# 1: NA <NA> NA 7501 10000 Unassigned
# 2: 10009 424 E 9TH ST APT 12, NEW YORK 10009 10001 10463 Agent 1
# 3: 10010 15 E 26TH ST APT 10C, NEW YORK 10010 10001 10463 Agent 1
# 4: 10013 310 GREENWICH ST, NEW YORK 10013 10001 10463 Agent 1
# 5: 10019 457 W 57TH ST, NEW YORK 10019 10001 10463 Agent 1
# 6: NA <NA> NA 10464 10464 Unassigned
# 7: NA <NA> NA 10465 11769 Agent 2
The default mode when there are no matches is nomatch=NA
, meaning that the missing columns of the extra rows are filled with NA
, as above. This is equivalent to a "full join" (one ref for joins: https://stackoverflow.com/a/6188334). If you want just matching rows, then foverlaps(..., nomatch=NULL)
will give you just 4 rows. (You can also reverse the order of dat1
and dat2
, but you might still need to use this if your actual data requires.)
Upvotes: 1