Reputation: 35
I have one dataframe df1
with the columns age
and testScore
. I need a third normScore
based on age
, testScore
and references on a second dataframe df2
. Let me show you:
df1
age | testScore | normScore |
---|---|---|
78.6 | 1 | |
87.1 | 2 |
df2
ageStart | ageEnd | testScore | normScore |
---|---|---|---|
70 | 79.9 | 1 | 15 |
70 | 79.9 | 2 | 20 |
80 | 89.9 | 1 | 18 |
80 | 89.9 | 2 | 24 |
So what I need is to compare age
in df1
to ageStart
and ageEnd
in df2
and then use testScore
in df1
to select normScore
in df2
and save normScore
in normScore
column in df1
, like as follows:
df1
age | testScore | normScore |
---|---|---|
78.6 | 1 | 15 |
87.1 | 2 | 24 |
On Excel I would use SUMPRODUCT, but I`m kinda new to R. Can you help me? (I'm posting again because I messed up the tables in the previous post, sorry about that)
Upvotes: 1
Views: 852
Reputation: 123768
Adapting this answer this could be achieved via the fuzzyjoin
package like so:
library(fuzzyjoin)
library(dplyr)
df1 <- data.frame(
age = c(78.6, 87.1),
testScore = c(1L, 2L)
)
df2 <- data.frame(
ageStart = c(70L, 70L, 80L, 80L),
ageEnd = c(79.9, 79.9, 89.9, 89.9),
testScore = c(1L, 2L, 1L, 2L),
normScore = c(15L, 20L, 18L, 24L)
)
fuzzy_left_join(df1, df2, by = c("testScore", "age" = "ageStart", "age" = "ageEnd"), match_fun = list(`==`, `>=`, `<=`)) %>%
select(age, testScore = testScore.x, normScore)
#> age testScore normScore
#> 1 78.6 1 15
#> 2 87.1 2 24
Upvotes: 2