Reputation: 435
I have two dataframes: df1 and df2.
df1
looks like this:
id text
1 I love this car
2 I hate this car
3 Cars are life
4 Bikers are also good
df2
looks like this:
id text
1 I love this supercar
2 I hate cars
3 Cars are love
4 Bikers are nice
Now, I want to keep only those words that are in df1
and df2
.
The word car
is in df1
but not in df2
, so I want to remove it.
The word life
is in df1
but not in df2
, so I want to remove it.
The word also
is in df1
but not in df2
, so I want to remove it.
The word good
is in df1
but not in df2
, so I want to remove it.
The word supercar
is in df2
but not in df1
, so I want to remove it.
The word nice
is in df2
but not in df1
, so I want to remove it.
Expected output of df1:
id text
1 I love this
2 I hate this
3 Cars are
4 Bikers are
Expected output of df2
id text
1 I love this
2 I hate cars
3 Cars are love
4 Bikers are
Upvotes: 1
Views: 186
Reputation: 862591
Create intersection of words in both columns and then remove not matched values:
a = set([y for x in df1['text'] for y in x.split()])
b = set([y for x in df2['text'] for y in x.split()])
c = a & b
print (c)
{'hate', 'are', 'Bikers', 'this', 'love', 'I', 'Cars'}
df1['text'] = df1['text'].apply(lambda x: ' '.join(y for y in x.split() if y in c))
df2['text'] = df2['text'].apply(lambda x: ' '.join(y for y in x.split() if y in c))
print (df1)
id text
0 1 I love this
1 2 I hate this
2 3 Cars are
3 4 Bikers are
print (df2)
id text
0 1 I love this
1 2 I hate
2 3 Cars are love
3 4 Bikers are
Upvotes: 2