Reputation: 125
I have seen couple of Word2Vec Models that can generate embeddings for Company Names, and performs well when the different formats of the same company names are given. But what I want to do is a bit different. For example, I have a list of company names like: ["abc informatics", "xyz communications", "intra soft", "gigabyte" ] Now, if a new company name comes up I want to check if it already matches with the existing company names by a threshold of 80% (probably through cosine similiarity or any other approach). Since the embedding models are trained on international companies it kind of performs poorly for local companies. Another problem is Word2Vec reflects on semantics while generating embeddings, for example "Plants ltd" and "Trees Ltd" will generate similiar embedings, but in reality both of them are quite different from one another!!!
I am open to any other solutions if embedding similiarity search doesnot work well.
This question is probably a duplicate to Create embeddings for string matching , but since it didnt receieve any good answers I am asking the question here anyways.
Upvotes: 0
Views: 498
Reputation: 1
I found ngrams work best for names and address of companies rather than semantic embeddings
Upvotes: 0