Reputation: 934
I want to find synonyms of words.
If word is tall building
then i want to find all synonyms of this word like "long apartment ,large building"
etc
I used Spacy.
import en_core_web_sm
nlp = en_core_web_sm.load()
LOOP
nlp('tall building').similarity(nlp(mytokens[i]))
I can't use this because it takes a lot of time
neither I can use PhraseMatcher for this
Please help me
thanks in Advance
Upvotes: 0
Views: 4723
Reputation: 15593
So it's a little hard to tell from your example, but it looks like you're creating a new spaCy doc in every iteration of your loop, which will be slow. You should do something like this instead:
import spacy
nlp = spacy.load('en')
query = nlp('tall building')
for token in mytokens:
query.similarity(nlp(token))
This way spaCy only has to create the query doc once.
If you want to make repeated queries, you should put the vector for each doc in annoy or similar to get the most similar doc quickly.
Also, I generally wouldn't call this finding "synonyms" since every example you gave is multiple words. You're really looking for similar phrases. "Synonyms" would usually imply single words, like you'd find in a thesaurus, but that won't help you here.
Upvotes: 0
Reputation: 362
you could try using beautiful soup to parse data from an online thesaurus or use a python module such as [py-thesaurus]:https://pypi.org/project/py-thesaurus/
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
from urllib.error import HTTPError
def find_synonym(string):
""" Function to find synonyms for a string"""
try:
# Remove whitespace before and after word and use underscore between words
stripped_string = string.strip()
fixed_string = stripped_string.replace(" ", "_")
print(f"{fixed_string}:")
# Set the url using the amended string
my_url = f'https://thesaurus.plus/thesaurus/{fixed_string}'
# Open and read the HTMLz
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
# Parse the html into text
page_soup = soup(page_html, "html.parser")
word_boxes = page_soup.find("ul", {"class": "list paper"})
results = word_boxes.find_all("div", "list_item")
# Iterate over results and print
for result in results:
print(result.text)
except HTTPError:
if "_" in fixed_string:
print("Phrase not found! Please try a different phrase.")
else:
print("Word not found! Please try a different word.")
if __name__ == "__main__":
find_synonym("hello ")
Upvotes: 2