Reputation: 10051
I have the following dataframe:
df = pd.DataFrame(
{'id': [1, 2, 3, 4, 5, 6],
'fruits': ['apple', 'apples', 'orange', 'apple tree', 'oranges', 'mango']
})
id fruits
0 1 apple
1 2 apples
2 3 orange
3 4 apple tree
4 5 oranges
5 6 mango
I hope to find fuzzy strings in column fruits
and get a new dataframe as follows, which ratio_score is higher than 80.
How could do that in Python using fuzzywuzzy packages? Thanks. Please note ratio_score
are a serie of values made-up as example.
My solution:
df.loc[:,'fruits_copy'] = df['fruits']
df['ratio_score'] = df[['fruits', 'fruits_copy']].apply(lambda row: fuzz.ratio(row['fruits'], row['fruits_copy']), axis=1)
Expected result:
id fruits matched_id matched_fruits ratio_score
0 1 apple 2 apples 95
1 1 apple 4 apple tree 85
2 2 apples 4 apple tree 80
3 3 orange 5 oranges 95
4 6 mango
Reference related:
Fuzzy matching a sorted column with itself using python
Apply fuzzy matching across a dataframe column and save results in a new column
How do I fuzzy match items in a column of an array in python?
Using fuzzywuzzy to create a column of matched results in the data frame
Upvotes: 3
Views: 5275
Reputation: 10051
My solution with references below: Apply fuzzy matching across a dataframe column and save results in a new column
df.loc[:,'fruits_copy'] = df['fruits']
compare = pd.MultiIndex.from_product([df['fruits'],
df['fruits_copy']]).to_series()
def metrics(tup):
return pd.Series([fuzz.ratio(*tup),
fuzz.token_sort_ratio(*tup)],
['ratio', 'token'])
compare.apply(metrics)
ratio token
apple apple 100 100
apples 91 91
orange 36 36
apple tree 67 67
oranges 33 33
mango 20 20
apples apple 91 91
apples 100 100
orange 33 33
apple tree 62 62
oranges 46 46
mango 18 18
orange apple 36 36
apples 33 33
orange 100 100
apple tree 25 25
oranges 92 92
mango 55 55
apple tree apple 67 67
apples 62 62
orange 25 25
apple tree 100 100
oranges 24 24
mango 13 13
oranges apple 33 33
apples 46 46
orange 92 92
apple tree 24 24
oranges 100 100
mango 50 50
mango apple 20 20
apples 18 18
orange 55 55
apple tree 13 13
oranges 50 50
mango 100 100
Upvotes: 1