Reputation: 296
I have two data frames
df1
+-------+---------+
| Id | Title |
+-------+---------+
| 1 | AAA |
| 2 | BBB |
| 3 | CCC |
+-------+---------+
AND
df2
+-------+---------------+------------------------------------+
| Id | Sub | Body |
+-------+---------------+------------------------------------+
| 1 | some sub1 | some mail body AAA some text here |
| 2 | some sub2 | some text here BBB continues here |
| 3 | some sub3 | some text AAA present here |
| 4 | some sub4 | AAA string is present here also |
| 5 | some sub5 | CCC string is present here |
+-------+---------------+------------------------------------+
I want to match the Title
from df1 with the Body
column of the df2,
if title string is present in Body column, then both the rows should be joined, the output data frame should be like:
df3
+----------+---------------+------------------------------------+
| Title | Sub | Body |
+----------+---------------+------------------------------------+
| AAA | some sub1 | some mail body AAA some text here |
| BBB | some sub2 | some text here BBB continues here |
| AAA | some sub3 | some text AAA present here |
| AAA | some sub4 | AAA string is present here also |
| CCC | some sub5 | CCC string is present here |
+----------+---------------+------------------------------------+
Upvotes: 1
Views: 963
Reputation: 35338
One solution could look like this, although more experienced R users will probably come up with better answers
# set up test data
df1 <- data.frame(stringsAsFactors = F,
id = 1:3,
title = c('AAA', 'BBB', 'CCC'))
df2 <- data.frame(stringsAsFactors = F,
id = 1:5,
sub = c('some sub1', 'some sub2', 'some sub3', 'some sub4', 'some sub5'),
body = c('some mail body AAA some text here',
'some text here BBB continous here',
'some text AAA present here',
'AAA string is present here also',
'CCC string is present here'))
# join data frames
df.list <- lapply(1:nrow(df1), function (idx) cbind(title=df1[idx,2], df2[grepl(df1$title[idx], df2$body), 2:3]))
do.call('rbind', df.list)
which will result in the following output
title sub body
1 AAA some sub1 some mail body AAA some text here
3 AAA some sub3 some text AAA present here
4 AAA some sub4 AAA string is present here also
2 BBB some sub2 some text here BBB continous here
5 CCC some sub5 CCC string is present here
If we cannot rely on the fact that each title will match some rows in df2
then you might want to do something like this
# set up test data
df1 <- data.frame(stringsAsFactors = F,
id = 1:4,
title = c('AAA', 'AAA BB', 'BBB', 'CCC'))
df2 <- data.frame(stringsAsFactors = F,
id = 1:5,
sub = c('some sub1', 'some sub2', 'some sub3', 'some sub4', 'some sub5'),
body = c('some mail body AAA some text here',
'some text here BBB continous here',
'some text AAA present here',
'AAA string is present here also',
'CCC string is present here'))
MergeByTitle <- function(title.idx) {
df2.hits <- df2[grepl(df1$title[title.idx], df2$body), 2:3]
if (nrow(df2.hits) > 0)
cbind(title=df1[title.idx,2], df2.hits)
}
# join data frames
df.list <- lapply(1:nrow(df1), MergeByTitle)
do.call('rbind', df.list)
Upvotes: 3