Akshay Gupta
Akshay Gupta

Reputation: 11

How to check the similarity between two lists in two different excel files using python?

I have two lists containing customer names. The names can be similar or different. How to find the similarity between these two lists using python?

After having similarity I want to pull corresponding data from one excel file to other.

example:

List 1:

Customer Name       Unique ID
IBM                 2365
BOA                 5456
BMW AG              2456

List 2:

Customer Name     Unique ID
IBM Pvt Ltd        
BMW Group
Robert Bosch
BOA Ltd

This is just a sample data. Actual data contains almost 300k lines.

I tried Jaccard Similarity by passing the two lists separately as an excel files to the function, but the result (i.e. Jaccard Similarity) is always zero.

Edit: How to iterate through both the lists, compare each element with all the elements of other list and build a distance matrix?

Then, I would like to sort each row of that matrix in descending order to know the closest match between them. Or is there any other better method to know the closest match after the matrix is built?

Upvotes: 0

Views: 862

Answers (1)

melvil james
melvil james

Reputation: 611

Could you elaborate and make your question a little clear ?

What doe you mean by Similarity beetwen 2 list ?

When you say List, you mean CSV/Excel List or Python list . If you are looking at distance beetwen the string you might have to look at Levenshtein Algorithm . https://www.geeksforgeeks.org/edit-distance-dp-5/

Pythonic - https://www.python-course.eu/levenshtein_distance.php .

Since your data size if humongous , Alsp Check external merge sort strategy

Upvotes: 0

Related Questions