ThomasPepperz
ThomasPepperz

Reputation: 176

How to Implement a Complex For-Loop + If Statement

I have two data sets, each containing five-digit ZIPs.

One data set looks like this:

From    To      Territory

7501    10000   Unassigned
10001   10463   Agent 1
10464   10464   Unassigned
10465   11769   Agent 2

And a second data set that looks like this:

    zip5    address
1   10009   424 E 9TH ST APT 12, NEW YORK
2   10010   15 E 26TH ST APT 10C, NEW YORK
3   10013   310 GREENWICH ST, NEW YORK
4   10019   457 W 57TH ST, NEW YORK

I would like to write a for-loop in R that loops through the zip5 column in the second data set, then loops through both the From and the To columns from dataset 1, checking if the zip5 falls within the From and To range, and once it finds a match, assigns the Territory value from the first dataset into a new column in second dataset.

I started to try to think through the logic but quickly became overwhelmed and thought I would turn to the StackOverflow community for guidance.

Here was my initial attempt:

for (i in nrow(df1)){
  for(j in nrow(df2)){
    if(df1[1, "zip5"] > df2[1, "From"] & df1[1, "zip5"] <= df2[1, "To"])
      df1$newColumn = df2[j, "Territory"]
  }
}

Upvotes: 0

Views: 35

Answers (1)

r2evans
r2evans

Reputation: 160687

You can use data.table::foverlaps for this:

library(data.table)

dat1 <- fread(text = '
From    To      Territory
7501    10000   Unassigned
10001   10463   "Agent 1"
10464   10464   Unassigned
10465   11769   "Agent 2"')

dat2 <- fread(text = '
zip5    address
10009   "424 E 9TH ST APT 12, NEW YORK"
10010   "15 E 26TH ST APT 10C, NEW YORK"
10013   "310 GREENWICH ST, NEW YORK"
10019   "457 W 57TH ST, NEW YORK"')

# if you use your own data and it is not a data.table, then do this:
setDT(dat1)
setDT(dat2)

Requirements to use foverlap:

  1. Both frames must have two fields, a "from" and a "to". While it might seem inane since we want to determine if "zip5" is within "From" to "To", the premise of the function is to find overlaps in two ranges. Instead of putting in special-case code to allow a single column in one frame, they chose (I'm inferring) to keep it general. This means we need to copy zip5 to another column.

  2. Both tables need to have their ranges as "keys". If there are other columns that are keys, then the range columns must be the last two. (And in order.)

# req't 1, need a range in the second frame
dat2[, zip5copy := zip5 ]
# set keys for both
setkey(dat1, From, To)
setkey(dat2, zip5, zip5copy)

And the code:

foverlaps(dat1, dat2)
#     zip5                        address zip5copy  From    To  Territory
# 1:    NA                           <NA>       NA  7501 10000 Unassigned
# 2: 10009  424 E 9TH ST APT 12, NEW YORK    10009 10001 10463    Agent 1
# 3: 10010 15 E 26TH ST APT 10C, NEW YORK    10010 10001 10463    Agent 1
# 4: 10013     310 GREENWICH ST, NEW YORK    10013 10001 10463    Agent 1
# 5: 10019        457 W 57TH ST, NEW YORK    10019 10001 10463    Agent 1
# 6:    NA                           <NA>       NA 10464 10464 Unassigned
# 7:    NA                           <NA>       NA 10465 11769    Agent 2

The default mode when there are no matches is nomatch=NA, meaning that the missing columns of the extra rows are filled with NA, as above. This is equivalent to a "full join" (one ref for joins: https://stackoverflow.com/a/6188334). If you want just matching rows, then foverlaps(..., nomatch=NULL) will give you just 4 rows. (You can also reverse the order of dat1 and dat2, but you might still need to use this if your actual data requires.)

Upvotes: 1

Related Questions