Reputation: 730
Reproducible Dataframes
Table 1
table1 = {'Text':['hello this is', 'a test data', 'frame for', 'stackoverflow'], 'keyid':[20, 21, 19, 18]}
table1 = pd.DataFrame(table1)
Text keyid
0 hello this is 20
1 a test data 21
2 frame for 19
3 stackoverflow 18
Table 2
table2 = {'word': ['hello', 'over','for','is', 'hey'], 'count': [1, 2, 1, 3, 5]}
table2 = pd.DataFrame(table2)
word count
0 hello 1
1 over 2
2 for 1
3 is 3
4 hey 5
I'm trying to create a conditional update of my table 1 based on the condition: If string in table 2 'word' column is found in table 1 'Text' column, then bring over the 'count' column from table 2, else leave it as NA.
Expected Output
Text keyid count
0 hello this is 20 1
1 a test data 21 NA
2 frame for 19 1
3 stackoverflow 18 NA
Note: 'over' appears in the Text column but it is not reflected in the expected output, as I do not require matching within the string itself.
Could someone point me in the right direction?
Upvotes: 2
Views: 279
Reputation: 75080
You can use series.str.extract()
with a pattern by word boundaries and then map
to get the respective table2 count
:
d=table2.set_index('word')['count']
p='({})'.format('\\b|\\b'.join(table2.word))
#'(hello\\b|\\bover\\b|\\bfor\\b|\\bis\\b|\\bhey)'
table1['count']=table1.Text.str.extract(p,expand=False).map(d)
print(table1)
Text keyid count
0 hello this is 20 1.0
1 a test data 21 NaN
2 frame for 19 1.0
3 stackoverflow 18 NaN
Upvotes: 2