Reputation: 415
I am trying to find similar strings in 2 different pandas dataframes using fuzzywuzzy
.
So far all I can think of is to iterate over each frame and then use fuzz.ratio(v1, v2)
to return a %
of similarity.
Logic like this:
for v1_df1, v2_df1 in df1[['given_name', 'surname']].itertuples(index=False):
for v1_df2, v2_df2 in df2[['given_name', 'surname']].itertuples(index=False):
ratio_v1 = fuzz.ratio(v1_df1, v1_df2)
This is not suitable, but hopefully, it demonstrates what I'm trying to do. I would like an effective way to match string on two separate pandas data frames and conditionally deem them similar given a configurable %
Upvotes: 0
Views: 48
Reputation: 13349
Say
df1:
Name First_Name
0 Lara Owlen
1 Heiberg Lanzer
2 Willy Jones
3 Rosy Lily
4 Stuart Littlt
df2:
Name First_Name
0 Braund Owen
1 Heikkinen Laina
2 Allen William
3 Moran James
4 McCarthy Timothy
import itertools
from fuzzywuzzy import fuzz
p1 = list(itertools.product(*[df1['Name'].values, df2['Name'].values]))
p2 = list(itertools.product(*[df1['First_Name'].values, df2['First_Name'].values]))
for N1, N2 in zip(p1,p2):
Name_ratio = fuzz.ratio(N1[0], N1[1])
First_Name_ratio = fuzz.ratio(N2[0], N2[1])
You can like this way.
Upvotes: 1