user08041991
user08041991

Reputation: 637

merge two tables by a given rule

Consider the example where I have two datatables, df1 is a copy of my order and SOH is my iventory. I want to merge the df1$price into SOH, whereby:

if SOH$arrival_year > df1$year, then write the price associated with the oldest year, if no older year appears write NA

if the SOH item doesnt appear in df1, write NA in price

supplier <- c(1,1,1,1,1,2,2)
item <- c(20,20,20,21,22,23,26)
year <- c(2000,2002,2008,2001,2007,2005,2009)
price <- c(.3,.4,.5,1.6,1.5,3.2,.25)
df1 <- data.frame(supplier, item, year, price)
#
supplier_on_hand <- c(1,1,1,1,1,1,2,2,3)
item_on_hand <- c(20,20,20,22,20,20,23,23,10)
arrival_year <- c(2000,2001,2002,2009,2007,2012,2006,2004,2009)
SOH <- data.frame(supplier_on_hand, item_on_hand, arrival_year)

The following output is desired:

enter image description here

Upvotes: 1

Views: 72

Answers (2)

Jaap
Jaap

Reputation: 83275

Another possibility is using the rolling join ability of the data.table-package:

library(data.table)
setDT(df1)[setDT(SOH), on = .(supplier = supplier_on_hand, item = item_on_hand, year = arrival_year), roll = Inf]

# in a bit more readable format:
setDT(SOH)
setDT(df1)
df1[SOH, on = .(supplier = supplier_on_hand, item = item_on_hand, year = arrival_year), roll = Inf]

# or with setting keys first:
setDT(SOH, key = c('supplier_on_hand','item_on_hand','arrival_year'))
setDT(df1, key = c('supplier','item','year'))
df1[SOH, roll = Inf]

which gives:

   supplier item year price
1:        1   20 2000   0.3
2:        1   20 2001   0.3
3:        1   20 2002   0.4
4:        1   20 2007   0.4
5:        1   20 2012   0.5
6:        1   22 2009   1.5
7:        2   23 2004    NA
8:        2   23 2006   3.2
9:        3   10 2009    NA

Upvotes: 2

LyzandeR
LyzandeR

Reputation: 37889

The following looks like it works for me:

cbind(SOH, price =
  apply(SOH, 1, function(x) {
    #setting the item and year constraints
    temp <- df1[df1$item == x[2] & df1$year <= x[3], ]
    #order per year descending as per rules
    temp <- temp[order(temp$year, decreasing = TRUE), ]
    #set to NA if item or year does not confirm rules
    if (is.na(temp[1, 'price'])) return(NA) else return(temp[1, 'price'])
  })
)

Ouput:

  supplier_on_hand item_on_hand arrival_year price
1                1           20         2000   0.3
2                1           20         2001   0.3
3                1           20         2002   0.4
4                1           22         2009   1.5
5                1           20         2007   0.4
6                1           20         2012   0.5
7                2           23         2006   3.2
8                2           23         2004    NA
9                3           10         2009    NA

Upvotes: 1

Related Questions