Bharath KM
Bharath KM

Reputation: 296

Join two data frames by searching & matching strings

I have two data frames

df1

+-------+---------+  
|   Id  |  Title  |
+-------+---------+  
|   1   |   AAA   |
|   2   |   BBB   |
|   3   |   CCC   |
+-------+---------+

AND

df2

+-------+---------------+------------------------------------+
|   Id  |      Sub      |               Body                 |
+-------+---------------+------------------------------------+  
|   1   |   some sub1   | some mail body AAA some text here  |
|   2   |   some sub2   | some text here BBB continues here  |
|   3   |   some sub3   | some text AAA present here         |
|   4   |   some sub4   | AAA string is present here also    |
|   5   |   some sub5   | CCC string is present here         |
+-------+---------------+------------------------------------+

I want to match the Title from df1 with the Body column of the df2,
if title string is present in Body column, then both the rows should be joined, the output data frame should be like:

df3

+----------+---------------+------------------------------------+
|   Title  |      Sub      |               Body                 |
+----------+---------------+------------------------------------+  
|   AAA    |   some sub1   | some mail body AAA some text here  |
|   BBB    |   some sub2   | some text here BBB continues here  |
|   AAA    |   some sub3   | some text AAA present here         |
|   AAA    |   some sub4   | AAA string is present here also    |
|   CCC    |   some sub5   | CCC string is present here         |
+----------+---------------+------------------------------------+

Upvotes: 1

Views: 963

Answers (1)

DAXaholic
DAXaholic

Reputation: 35338

One solution could look like this, although more experienced R users will probably come up with better answers

# set up test data
df1 <- data.frame(stringsAsFactors = F,
                  id = 1:3,
                  title = c('AAA', 'BBB', 'CCC'))
df2 <- data.frame(stringsAsFactors = F,
                  id = 1:5,
                  sub = c('some sub1', 'some sub2', 'some sub3', 'some sub4', 'some sub5'),
                  body = c('some mail body AAA some text here',
                           'some text here BBB continous here',
                           'some text AAA present here',
                           'AAA string is present here also',
                           'CCC string is present here'))

# join data frames
df.list <- lapply(1:nrow(df1), function (idx) cbind(title=df1[idx,2], df2[grepl(df1$title[idx], df2$body), 2:3]))
do.call('rbind', df.list)

which will result in the following output

  title       sub                              body
1   AAA some sub1 some mail body AAA some text here
3   AAA some sub3        some text AAA present here
4   AAA some sub4   AAA string is present here also
2   BBB some sub2 some text here BBB continous here
5   CCC some sub5        CCC string is present here

Update due to comment:

If we cannot rely on the fact that each title will match some rows in df2 then you might want to do something like this

# set up test data
df1 <- data.frame(stringsAsFactors = F,
                  id = 1:4,
                  title = c('AAA', 'AAA BB', 'BBB', 'CCC'))
df2 <- data.frame(stringsAsFactors = F,
                  id = 1:5,
                  sub = c('some sub1', 'some sub2', 'some sub3', 'some sub4', 'some sub5'),
                  body = c('some mail body AAA some text here',
                           'some text here BBB continous here',
                           'some text AAA present here',
                           'AAA string is present here also',
                           'CCC string is present here'))

MergeByTitle <- function(title.idx) {
  df2.hits <- df2[grepl(df1$title[title.idx], df2$body), 2:3]
  if (nrow(df2.hits) > 0)
    cbind(title=df1[title.idx,2], df2.hits)
}

# join data frames
df.list <- lapply(1:nrow(df1), MergeByTitle)
do.call('rbind', df.list)

Upvotes: 3

Related Questions