Reputation: 113
I want to left_join
two dataframes.
pensions <- data.frame(gender = c(rep("Female", 3 ), (rep("Male", 2 ) )),
age = c(66,69,72,85,62),
type = c(rep("widow",2), rep("disability", 2 ), "old age"))
gender <- c(rep("Female", 12), rep("Male", 12))
type <- rep(c(rep("widow",4), rep("disability",4), rep("old_age",4)),2)
min_age <- rep(c(65,seq(76, 99, by = 10)),6)
max_age <- rep(c(seq(75, 99, by = 10),99),6)
factor <- c(435,215,415,89,10,45,36,74,98,15,27,23,
117,343,201,58,41,95,777,65,21,57,13,6)
pension_factors <- data.frame(gender, type, min_age, max_age, factor )
The first dataframe pensions contains the columns gender, age, and type. The latter reoresents the type of pension.
The second dataframe pension_factors contains the columns gender, type, min_age, max_age, and factor.
How do I left_join
pensions with pension_factors in a way that I left_join
by gender and type and the age of pensions$age
is equal or greater than pension_factors$min_age
but less than or equal to pension_factors$max_age
?
Upvotes: 0
Views: 518
Reputation: 682
You can first merge the two datasets:
mergeddf<-merge(x=pensions,y=pension_factors,by.x=c("gender", "type"),by.y=c("gender", "type"),all.x = TRUE)
then only take a subset of the merged dataset in which the age is between min and max age:
finaldf<-mergeddf[which(mergeddf$age>=mergeddf$min_age & mergeddf$age<=mergeddf$max_age),]
# gender type age min_age max_age factor
#1 Female disability 72 65 75 10
#5 Female widow 66 65 75 435
#9 Female widow 69 65 75 435
#14 Male disability 85 76 85 95
Upvotes: 1
Reputation: 50668
We could use fuzzyjoin
:
library(fuzzyjoin)
library(tidyverse)
fuzzy_left_join(
pensions %>% mutate_if(is.factor, as.character),
pension_factors %>% mutate_if(is.factor, as.character),
by = c(
"gender" = "gender",
"type" = "type",
"age" = "min_age",
"age" = "max_age"),
match_fun = c(`==`, `==`, `>=`, `<=`))
# gender.x age type.x gender.y type.y min_age max_age factor
#1 Female 66 widow Female widow 65 75 435
#2 Female 69 widow Female widow 65 75 435
#3 Female 72 disability Female disability 65 75 10
#4 Male 85 disability Male disability 76 85 95
#5 Male 62 old age <NA> <NA> NA NA NA
Explanation: by
lists all columns by which we match, and match_fun
lists the corresponding matching conditions.
Upvotes: 1
Reputation: 12155
You should be able to join and then filter the joined data frame:
library(dplyr)
left_join(pensions, pension_factors, by=c('gender', 'type')) %>%
filter(age >= min_age,
age <= max_age)
gender age type min_age max_age factor
1 Female 66 widow 65 75 435
2 Female 69 widow 65 75 435
3 Female 72 disability 65 75 10
4 Male 85 disability 76 85 95
Upvotes: 2