Bustergun
Bustergun

Reputation: 1037

R vlookup based on the closest numeric variable

I want to do something similar to vlookup in R, using numeric variable as a basis.

Sample lookup table:

> Value <- c(1,1.5,2,2.5,3,3.5,4,4.5,5)
> Code <- c("A","B","C","D","E","F","G","H","I")
> Lookup_Table <- data.frame(Value, Code)
> Lookup_Table
  Value Code
1   1.0    A
2   1.5    B
3   2.0    C
4   2.5    D
5   3.0    E
6   3.5    F
7   4.0    G
8   4.5    H
9   5.0    I

Sample data table:

> DataSample <- c(1.2,1,2.3,2.7,3.1,3,4.6,4.5,3.8)
> DataSample <- data.frame(DataSample)
> DataSample
  DataSample
1        1.2
2        1.0
3        2.3
4        2.7
5        3.1
6        3.0
7        4.6
8        4.5
9        3.8

So from this DataSample values I want to match the corresponding Code based on the lookup table values. And if my value is for example 1.2, I want to round it up to the nearest value on the lookup table which is 1.5. So I expect to get the corresponding code for 1.5.

My desired output is:

> DataSample
  DataSample Code
1        1.2    B
2        1.0    A
3        2.3    D
4        2.7    E
5        3.1    F
6        3.0    E
7        4.6    I
8        4.5    H
9        3.8    G

Upvotes: 2

Views: 266

Answers (3)

ThomasIsCoding
ThomasIsCoding

Reputation: 101663

A data.table option with non-equi join

setorder(
  setDT(Lookup_Table),
  "Value"
)[setDT(DataSample),
  on = .(Value >= DataSample)
][
  ,
  .(Code = first(Code)), .(DataSample = Value)
]

which gives

   DataSample Code
1:        1.2    B
2:        1.0    A
3:        2.3    D
4:        2.7    E
5:        3.1    F
6:        3.0    E
7:        4.6    I
8:        4.5    H
9:        3.8    G

Upvotes: 2

Ben
Ben

Reputation: 30474

A base R approach could use findInterval like this:

DataSample$Code <- with(Lookup_Table, 
                        Code[findInterval(DataSample$DataSample, Value, left.open = T) + 1]) 

Output

  DataSample Code
1        1.2    B
2        1.0    A
3        2.3    D
4        2.7    E
5        3.1    F
6        3.0    E
7        4.6    I
8        4.5    H
9        3.8    G

Upvotes: 3

Bulat
Bulat

Reputation: 6979

Here I have used data.table to:

  1. create intervals in your lookup table
  2. apply foverlaps function to do the merge
value <- c(1,1.5,2,2.5,3,3.5,4,4.5,5)
code <- c("A","B","C","D","E","F","G","H","I")
Lookup_Table <- data.frame(value, code)
setDT(Lookup_Table)

Lookup_Table <- Lookup_Table[order(value)]
Lookup_Table[, previous.value := shift(value)]
Lookup_Table[, next.value := shift(value, type = "lead")]
Lookup_Table[, start := (previous.value + value) / 2]
Lookup_Table[, end := (next.value + value) / 2]
Lookup_Table[is.na(start), start := value]
Lookup_Table[is.na(end), end := value]
Lookup_Table <- Lookup_Table[, .(start, end, value, code)]
setkey(Lookup_Table, start, end)

DataSample <- data.frame(value = c(1.2,1,2.3,2.7,3.1,3,4.6,4.5,3.8))
setDT(DataSample)
DataSample[, start := value]
DataSample[, end := value]
DataSample <- DataSample[, .(start, end, value)]
setkey(DataSample, start, end)


res <- foverlaps(
  DataSample, 
  Lookup_Table, 
  by.x = c("start", "end"),
  by.y = c("start", "end")
)

res <- res[, .(value = i.value, code)]

> res
#   value code
# 1:   1.0    A
# 2:   1.2    A
# 3:   2.3    D
# 4:   2.7    D
# 5:   3.0    E
# 6:   3.1    E
# 7:   3.8    G
# 8:   4.5    H
# 9:   4.6    H

Result is slightly different, you probably want to play with how ranges are defined and applied

Upvotes: 2

Related Questions