Reputation: 35
I have a column in a data.table (DT1)
that is a decreasing count (let's say pol_count
), and another column for the average age in a population (say AverageAge
). I am trying to take my value of pol_count
(say 400) and my value of AverageAge
(say 85) and match them to an interval within a table called FactorFile
.
Here is a glimpse of FactorFile
:
Count_Greater_Than Age_Less_Than Months Factor
100 80 12 1
85 82 16 0.85
65 84 20 0.65
45 86 24 0.45
In this instance, pol_count
is > 100 so it would return the third column's (Months
) value of 12, and AverageAge
is < 86 but > 84 so it would return 20. I then need another column to take the max, so final answer is 20. Finally, a column for the Factor
associated with 20 which is 65%.
I'm not sure how to join on a table without using exact matches. Basically I want a VLOOKUP with the last argument as TRUE, to pull the closest value to the supplied lookup value.
I've tried using the package fuzzyjoin
, here is my statement. It runs, but nothing seems to happen to my data.table:
fuzzy_left_join(DT1, FactorFile, by = c("AverageAge" = "Average_Age_Less_Than"), match_fun = `<=`)
Any tips on how to use fuzzyjoin correctly or an easier way to make this join happen? To summarize, I'm trying to take two columns in DT1, and return both a Months column and Factor column onto DT1, by joining on FactorFile.
Thanks!
Upvotes: 1
Views: 114
Reputation: 25225
Maybe something like this:
DT1[, c("M1", "F1") :=
FactorFile[.SD, on=.(Count_Greater_Than=pol_count), roll=Inf, .(Months, Factor)]
]
DT1[, c("M2", "F2") :=
FactorFile[.SD, on=.(Age_Less_Than=AverageAge), roll=Inf, .(Months, Factor)]
]
DT1[, c("M", "Fac") := {
mm <- pmax(M1, M2)
.(mm, fifelse(mm==M1, F1, F2))
}]
output:
pol_count AverageAge M1 F1 M2 F2 M Fac
1: 400 85 12 1 20 0.65 20 0.65
data:
library(data.table)
DT1 <- data.table(pol_count=400, AverageAge=85)
FactorFile <- fread("
Count_Greater_Than Age_Less_Than Months Factor
100 80 12 1
85 82 16 0.85
65 84 20 0.65
45 86 24 0.45")
Upvotes: 1