user2900006
user2900006

Reputation: 447

Matching part of a string in a data frame to a string in another data frame

I have a two separate data frames that looks like this:

#data frame 1
set.seed(5)
first<-c("Jane, Sarah","Bill, Conrad", "Jim, Dave", "Mark, Ben", "Mike, Frank")
month<-c("Feb","Jan","Dec","Jun","Aug")
df1<-data.frame(first,month)

#data frame 2
first<-c("John", "Brendan", "Mark", "Dave", "Sarah", "Julie", "Frank", "Henry")
vals<-seq(8)*floor(runif(8,min=10, max=100))
df2<-data.frame(first,vals)

What I want to do append to the first data frame the values from the second data frame when there is a match to either name (there won't be a match to both, just one). If there is no match, the value can can be assigned a '0'.

The idea is to end up with a final data frame that looks like this:

#data frame final
first<-c("Jane, Sarah","Bill", "Jim, Dave", "Mark", "Mike, Frank")
month<-c("Feb","Jan","Dec","Jun","Aug")
vals<-c(95,0,140,276,399)
df3<-data.frame(first,month,vals)

I have tried using grep to match but can't seem to get the values to match. Any ideas on how to append these values for a partial match?

Upvotes: 1

Views: 48

Answers (1)

Mike H.
Mike H.

Reputation: 14370

Would this work for you? We extract all the words from the first column and then lapply over the results to get the matches.

library(stringr)

df_res <- df1
df_res$vals <- lapply(str_extract_all(df1$first, "\\w+"), function(x) {res <- df2$vals[match(x, df2$first)]
                                                                       res[is.na(res)] <- 0
                                                                       max(res)
                                                             })

df_res
#         first month vals
#1  Jane, Sarah   Feb   95
#2 Bill, Conrad   Jan    0
#3    Jim, Dave   Dec  140
#4    Mark, Ben   Jun  276
#5  Mike, Frank   Aug  399

Upvotes: 1

Related Questions