Reputation: 491
I have a poem and I want the Python code to just print those words which are rhyming with each other.
So far I am able to:
wordpunct_tokenize()
cmudict.entries()
with elements as those last words and their pronunciation. I am stuck with the next step. How should I try to match those pronunciations? In all, my major task is to find out if two given words rhyme or not. If rhyme, then return True
, else False
.
Upvotes: 22
Views: 18715
Reputation: 26
A little bit late to the party, but I was making a project to find many many rhyming lines for processing. I started in this question thread, and was sparked by kender 's accepted answer. Just a good thing to know that when identifying thousands of rhymes, you can leverage dictionary lookup to greatly improve the speed.
This is also updated for python 3.x.
Store the CMU pronouncing dictionary as a json file with an init function and a function that converts the entries to from tuple to dict.
# global
json_entries = None
def tup2dict(tup, di):
for a, b in tup:
di.setdefault(a, []).append(b)
return di
def init_cmu(args):
import nltk
nltk.download('cmudict')
nltk.corpus.cmudict.ensure_loaded()
cmu_entries = nltk.corpus.cmudict.entries()
cmu_dict = dict()
tup2dict(cmu_entries, cmu_dict)
with open('./maps/cmu.json', 'w') as convert_file:
convert_file.write(json.dumps(cmu_dict))
If you use a function in your script than needs to determine rhymes, you can call this function at the top of it to ensure you have access to the dictionary.
def require_rhyme_dict():
global json_entries
if json_entries:
return
try:
jsonf = open('./maps/cmu.json', 'r')
except:
pass
else:
# Global
json_entries = dict(json.load(jsonf))
jsonf.close()
print('json_entries loaded.')
Finally, here is a modified version of kender's rhyme function using the dictionary and a different method of sub-word checking (I hit some edge case were the rhyme wasn't 'lame', but the words weren't allowed to rhyme based on the index of the beginning of a word, and the length. Can't recall the example exactly, it was rare).
When calling thousands of times in a row, the following function call is much faster than iterating through all the nltk cmu entries to find the list of pronunciation syllables once a word is found. The level, as before, refers to how many Pronunciation syllables between the words, matching those found at the end, should match in order to constitute a rhyme. In this example, if there are multiple pronunciations, those are also checked, for each word.
def isContainSameWord(word1, word2):
if word1 in word2 or word2 in word1:
return True
else:
return False
def isRhyme(word1, word2, level):
require_rhyme_dict()
global json_entries
if isContainSameWord(word1, word2):
return False
word1_syllable_arrs = json_entries.get(word1)
word2_syllables_arrs = json_entries.get(word2)
if not word1_syllable_arrs or not word2_syllables_arrs:
return False
for a in word1_syllable_arrs:
for b in word2_syllables_arrs:
if a[-level:] == b[-level:]:
return True
return False
Upvotes: 1
Reputation: 87201
Here I found a way to find rhymes to a given word using NLTK:
def rhyme(inp, level):
entries = nltk.corpus.cmudict.entries()
syllables = [(word, syl) for word, syl in entries if word == inp]
rhymes = []
for (word, syllable) in syllables:
rhymes += [word for word, pron in entries if pron[-level:] == syllable[-level:]]
return set(rhymes)
where inp
is a word and level
means how good the rhyme should be.
So you could use this function and to check if 2 words rhyme you could just check if one is in other's set of allowed rhymes:
def doTheyRhyme(word1, word2):
# first, we don't want to report 'glue' and 'unglue' as rhyming words
# those kind of rhymes are LAME
if word1.find(word2) == len(word1) - len(word2):
return False
if word2.find(word1) == len(word2) - len(word1):
return False
return word1 in rhyme(word2, 1)
Upvotes: 14
Reputation: 1600
The Pronouncing
library does a great job for that. No hacking, quick to load, and is based on the CMU Pronouncing Dictionary so it's reliable.
https://pypi.python.org/pypi/pronouncing
From their documentation:
>>> import pronouncing
>>> pronouncing.rhymes("climbing")
['diming', 'liming', 'priming', 'rhyming', 'timing']
Upvotes: 21
Reputation: 6806
Use soundex or double metaphone to find out if they rhyme. NLTK doesn't seem to implement these but a quick Google search showed some implementations.
Upvotes: 2