jony
jony

Reputation: 934

check similarity or synonyms between word in Python

I want to find synonyms of words.

If word is tall building then i want to find all synonyms of this word like "long apartment ,large building"etc

I used Spacy.

import en_core_web_sm
nlp = en_core_web_sm.load()

LOOP
nlp('tall building').similarity(nlp(mytokens[i]))

I can't use this because it takes a lot of time

neither I can use PhraseMatcher for this

Please help me

thanks in Advance

Upvotes: 0

Views: 4723

Answers (2)

polm23
polm23

Reputation: 15593

So it's a little hard to tell from your example, but it looks like you're creating a new spaCy doc in every iteration of your loop, which will be slow. You should do something like this instead:

import spacy
nlp = spacy.load('en')

query = nlp('tall building')
for token in mytokens:
    query.similarity(nlp(token))

This way spaCy only has to create the query doc once.

If you want to make repeated queries, you should put the vector for each doc in annoy or similar to get the most similar doc quickly.

Also, I generally wouldn't call this finding "synonyms" since every example you gave is multiple words. You're really looking for similar phrases. "Synonyms" would usually imply single words, like you'd find in a thesaurus, but that won't help you here.

Upvotes: 0

steve2020
steve2020

Reputation: 362

you could try using beautiful soup to parse data from an online thesaurus or use a python module such as [py-thesaurus]:https://pypi.org/project/py-thesaurus/

 from bs4 import BeautifulSoup as soup
 from urllib.request import urlopen as uReq
 from urllib.error import HTTPError




def find_synonym(string):
    """ Function to find synonyms for a string"""


    try:

        # Remove whitespace before and after word and use underscore between words
        stripped_string = string.strip()
        fixed_string = stripped_string.replace(" ", "_")
        print(f"{fixed_string}:")

        # Set the url using the amended string
        my_url = f'https://thesaurus.plus/thesaurus/{fixed_string}'
        # Open and read the HTMLz
        uClient = uReq(my_url)
        page_html = uClient.read()
        uClient.close()

        # Parse the html into text
        page_soup = soup(page_html, "html.parser")
        word_boxes = page_soup.find("ul", {"class": "list paper"})
        results = word_boxes.find_all("div", "list_item")

        # Iterate over results and print
        for result in results:
            print(result.text)

    except HTTPError:
        if "_" in fixed_string:
            print("Phrase not found! Please try a different phrase.")

        else:
            print("Word not found! Please try a different word.")


if __name__ == "__main__":
    find_synonym("hello ")

Upvotes: 2

Related Questions