steppermotor
steppermotor

Reputation: 701

Joining two datasets using fuzzy logic

I’m trying to do a fuzzy logic join in R between two datasets:

I would like to use the name column to join between the two data sets. However the name column may have additional or leading characters in either data set or have one word contained inside of a larger word. So for example if we looked at these two data sets, I'd like the name OPAL to join to the OPALAS, and SAUSALITO Y to join to SAUSALITO.

Dataset1:    
     Name           Config
     ALTO D         BB
     CONTRA         ST
     EIGHT A        DD
     OPALAS         BB
     SAUSALITO Y    AA
     SOLANO J       ST

Dataset2:    
    Name       Age     Rank
    ALTO D     50      2
    ALTO D     20      6
    CONTRA     10      10
    CONTRA     15      15
    EIGHTH     18      21
    OPAL       19      4
    SAUSALITO  2       12
    SOLANO     34      43

Code for Summarizing of Dataset2

Data2a <- summaryBy(Age ~ Name,FUN=c(mean), data=Data2,na.rm=TRUE)
Data2b <- summaryBy(Rank ~ Name,FUN=c(sum), data=Data2,na.rm=TRUE)
Data2 <- data.frame(Data2a$Name, Data2a$Age.mean, Data2b$Rank.sum)

Desired Outcome:
    Name        Config  Age   Rank
    ALTO D      BB      35    8
    CONTRA      ST      12.5  25
    EIGHT A     DD      18    21
    OPALAS      BB      19    4
    SAUSALITO Y AA      12    5
    SOLANO J    ST      34    43

Upvotes: 12

Views: 6919

Answers (1)

steppermotor
steppermotor

Reputation: 701

I was able to join the two datasets, using the fuzzyjoin package:

library(fuzzyjoin)
stringdist_inner_join(Dataset1, Data2,
     by ="Name", distance_col = NULL)

Upvotes: 12

Related Questions