avriis
avriis

Reputation: 1681

How to create ID (by) before merging in R?

I have two dataframes df.o and df.m as defined below. I need to find which observation in df.o (dimension table) corresponds which observations in df.m (fact table) based on two criteria: 1) df.o$Var1==df.o$Var1 and df.o$date1 < df.m$date2 < df.o$date3 such that I get the correct value of df.o$oID in df.m$oID (the correct value is manually entered in df.m$CORRECToID). I need the ID to complete a merge afterwards.

df.o <- data.frame(oID=1:4,
                  Var1=c("a","a","b","c"),
                  date3=c(2015,2011,2014,2015),
                  date1=c(2013,2009,2012,2013),
                  stringsAsFactors=FALSE)
df.m <- data.frame(mID=1:3,
                      Var1=c("a","a","b"),
                      date2=c(2014,2010,2013),
                      oID=NA,
                      CORRECToID=c(1,2,3),
                      points=c(5, 10,15),
                      stringsAsFactors=FALSE)

I have tried various combinations of like the code below, but without luck:

df.m$oID[df.m$date2 < df.o$date3 & df.m$date2 > df.o$date1 & df.o$Var1==df.m$Var1] <- df.o$oID

I have also tried experimenting with various combinations of ifelse, which and match, but none seem to do the trick.

The problem I keep encountering is that my replacement was a different number of rows than data and that "longer object length is not a multiple of shorter object length".

Upvotes: 1

Views: 119

Answers (1)

David Arenburg
David Arenburg

Reputation: 92302

What you are looking for is called an "overlap join", you could try the data.table::foverlaps function in order to achieve this.

The idea is simple

  1. Create the columns to overlap on (add an additional column to df.m)
  2. key by these columns
  3. run foverlaps and select the column you want back

    library(data.table)
    setkey(setDT(df.m)[, date4 := date2], Var1, date2, date4)
    setkey(setDT(df.o), Var1, date1, date3)
    foverlaps(df.m, df.o)[, names(df.m), with = FALSE]
    #    mID Var1 date2 oID CORRECToID points date4
    # 1:   2    a  2010   2          2     10  2010
    # 2:   1    a  2014   1          1      5  2014
    # 3:   3    b  2013   3          3     15  2013
    

Upvotes: 2

Related Questions