Reputation: 127
I would like to know how you would find all the variations of a word, or the words that are related or very similar the the original word in Python.
An example of the sort of thing I am looking for is like this:
word = "summary" # any word
word_variations = find_variations_of_word(word) # a function that finds all the variations of a word, What i want to know how to make
print(word_variations)
# What is should print out: ["summaries", "summarize", "summarizing", "summarized"]
This is just an example of what the code should do, i have seen other similar question on this same topic, but none of them were accurate enough, i found some code and altered it to my own, which kinda works, but now to way i would like it to.
import nltk
from nltk.corpus import wordnet
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
def find_inflections(word):
inflections = []
for synset in wordnet.synsets(word): # Find all synsets for the word
for lemma in synset.lemmas(): # Find all lemmas for each synset
inflected_form = lemma.name().replace("_", " ") # Get the inflected form of the lemma
if inflected_form != word: # Only add the inflected form if it's different from the original word
inflections.append(inflected_form)
return inflections
word = "summary"
inflections = find_inflections(word)
print(inflections)
# Output: ['sum-up', 'drumhead', 'compendious', 'compact', 'succinct']
# What the Output should be: ["summaries", "summarize", "summarizing", "summarized"]
Upvotes: 2
Views: 332
Reputation: 51
This probably isn't of any use to you, but may help someone else who finds this with a search -
If the aim is just to find the words, rather than specifically to use a machine-learning approach to the problem, you could try using a regular expression (regex).
w3 schools seems to cover enough to get the result you want here or there is a more technical overview on python.org
to search case insensitively for the specific words you listed the following would work:
import re
string = "A SUMMARY ON SUMMATION:" \
"We use summaries to summarize. This action is summarizing. " \
"Once the action is complete things have been summarized."
occurrences = re.findall("summ[a-zA-Z]*", string, re.IGNORECASE)
print(occurrences)
However, depending on your precise needs you may need to modify the regular expression as this would also find words like 'summer' and 'summon'.
I'm not very good at regex but they can be a powerful tool if you know precisely what you are looking for and spend a little time crafting the right expression.
Sorry this probably isn't relevant to your circumstance but good luck.
Upvotes: 2