Left_Join without dublicate in R

Question

I'd like to make a kind of vlookup. As you know in Excel vlookup function takes first value from the data. On the other hand left_join function works similarly. But when the first data is not unique on lookup values, left_join function dublicate de values. I'd like to do same think with excel without making unique the first dataset. I dont want to manupilate any of the data. Just want to take first looked value.

For example. In this case we see Title A is dublicated in Data3 and also StartYear looks the same with EvaYear :(

I want to find EvaYear in between StartDate and EndDate. If EvaYear equel to EndDate and next period's StartDate. I'd like to take first row. That's why I had to use >= & <= in my formula at the same time..

Title <- c("A","A","A","B","B")
StartYear <- c(2000,2005,2008,2010,2012)
EndYear <- c(2005,2008,2010,2012,2015)
Score <- c(100,75,80,95,79)
Data1 <- data.frame(Title,StartYear,EndYear,Score)

Title <- c("A","B")
EvaYear <- c(2008,2015)
Data2 <- data.frame(Title,EvaYear)

setDT(Data2)
Data3 <- setDT(Data1)[Data2,.(Title,StartYear,EndYear,EvaYear,Score),on=.(Title,StartYear<=EvaYear,EndYear>=EvaYear)]

After run the query I get:

  Title   StartYear   EndYear   EvaYear  Score
     A        2008        2008      2008    75
     A        2008        2008      2008    80
     B        2015        2015      2015    79

But I need to get:

Title   StartYear   EndYear   EvaYear  Score
 A        2005        2008      2008    75
 B        2012        2015      2015    79

Uwe · Accepted Answer

As mentioned by r2evans, non-equi joins pick the columns from the second data.table. Columns of the first data.table can be referred to using the prefix x.:

setDT(Data1)[Data2, .(Title, x.StartYear, x.EndYear, EvaYear, Score), 
             on = .(Title, StartYear <= EvaYear, EndYear >= EvaYear)]

   Title x.StartYear x.EndYear EvaYear Score
1:     A        2005      2008    2008    75
2:     A        2008      2010    2008    80
3:     B        2012      2015    2015    79

The duplicate entries can be removed by mult = "first" as suggested by jangorecki:

setDT(Data1)[Data2, .(Title, StartYear = x.StartYear, EndYear = x.EndYear, EvaYear, Score), 
             on = .(Title, StartYear <= EvaYear, EndYear >= EvaYear), mult = "first"]

   Title StartYear EndYear EvaYear Score
1:     A      2005    2008    2008    75
2:     B      2012    2015    2015    79

I have also renamed the "x." columns to reproduce OP's expected result.

Left_Join without dublicate in R

Answers (2)

Related Questions