Dennis Aguilar
Dennis Aguilar

Reputation: 113

left_join by categories and range values

I want to left_join two dataframes.

pensions <- data.frame(gender = c(rep("Female", 3 ), (rep("Male", 2 ) )), 
                   age = c(66,69,72,85,62), 
                   type = c(rep("widow",2), rep("disability", 2 ), "old age"))




gender  <- c(rep("Female", 12), rep("Male", 12))
type    <- rep(c(rep("widow",4), rep("disability",4), rep("old_age",4)),2)
min_age <- rep(c(65,seq(76, 99, by = 10)),6)
max_age <- rep(c(seq(75, 99, by = 10),99),6)
factor   <- c(435,215,415,89,10,45,36,74,98,15,27,23,
         117,343,201,58,41,95,777,65,21,57,13,6)


pension_factors <- data.frame(gender, type, min_age, max_age, factor )

The first dataframe pensions contains the columns gender, age, and type. The latter reoresents the type of pension.

The second dataframe pension_factors contains the columns gender, type, min_age, max_age, and factor.

How do I left_join pensions with pension_factors in a way that I left_join by gender and type and the age of pensions$age is equal or greater than pension_factors$min_age but less than or equal to pension_factors$max_age?

Upvotes: 0

Views: 518

Answers (3)

Shirin Yavari
Shirin Yavari

Reputation: 682

You can first merge the two datasets:

mergeddf<-merge(x=pensions,y=pension_factors,by.x=c("gender", "type"),by.y=c("gender", "type"),all.x = TRUE)

then only take a subset of the merged dataset in which the age is between min and max age:

finaldf<-mergeddf[which(mergeddf$age>=mergeddf$min_age & mergeddf$age<=mergeddf$max_age),]
     #  gender       type age min_age max_age factor
     #1 Female disability  72      65      75     10
     #5 Female      widow  66      65      75    435
     #9 Female      widow  69      65      75    435
     #14  Male disability  85      76      85     95

Upvotes: 1

Maurits Evers
Maurits Evers

Reputation: 50668

We could use fuzzyjoin:

library(fuzzyjoin)
library(tidyverse)
fuzzy_left_join(
    pensions %>% mutate_if(is.factor, as.character),
    pension_factors %>% mutate_if(is.factor, as.character),
    by = c(
        "gender" = "gender",
        "type" = "type",
        "age" = "min_age",
        "age" = "max_age"),
    match_fun = c(`==`, `==`, `>=`, `<=`))
#  gender.x age     type.x gender.y     type.y min_age max_age factor
#1   Female  66      widow   Female      widow      65      75    435
#2   Female  69      widow   Female      widow      65      75    435
#3   Female  72 disability   Female disability      65      75     10
#4     Male  85 disability     Male disability      76      85     95
#5     Male  62    old age     <NA>       <NA>      NA      NA     NA

Explanation: by lists all columns by which we match, and match_fun lists the corresponding matching conditions.

Upvotes: 1

divibisan
divibisan

Reputation: 12155

You should be able to join and then filter the joined data frame:

library(dplyr)

left_join(pensions, pension_factors, by=c('gender', 'type')) %>%
    filter(age >= min_age,
           age <= max_age)

  gender age       type min_age max_age factor
1 Female  66      widow      65      75    435
2 Female  69      widow      65      75    435
3 Female  72 disability      65      75     10
4   Male  85 disability      76      85     95

Upvotes: 2

Related Questions