Reputation: 11
I have two lists containing customer names. The names can be similar or different. How to find the similarity between these two lists using python?
After having similarity I want to pull corresponding data from one excel file to other.
example:
List 1:
Customer Name Unique ID
IBM 2365
BOA 5456
BMW AG 2456
List 2:
Customer Name Unique ID
IBM Pvt Ltd
BMW Group
Robert Bosch
BOA Ltd
This is just a sample data. Actual data contains almost 300k lines.
I tried Jaccard Similarity by passing the two lists separately as an excel files to the function, but the result (i.e. Jaccard Similarity) is always zero.
Edit: How to iterate through both the lists, compare each element with all the elements of other list and build a distance matrix?
Then, I would like to sort each row of that matrix in descending order to know the closest match between them. Or is there any other better method to know the closest match after the matrix is built?
Upvotes: 0
Views: 862
Reputation: 611
Could you elaborate and make your question a little clear ?
What doe you mean by Similarity beetwen 2 list ?
When you say List, you mean CSV/Excel List or Python list . If you are looking at distance beetwen the string you might have to look at Levenshtein Algorithm . https://www.geeksforgeeks.org/edit-distance-dp-5/
Pythonic - https://www.python-course.eu/levenshtein_distance.php .
Since your data size if humongous , Alsp Check external merge sort strategy
Upvotes: 0