learner57
learner57

Reputation: 491

Find rhyme using NLTK in Python

I have a poem and I want the Python code to just print those words which are rhyming with each other.

So far I am able to:

  1. Break the poem sentences using wordpunct_tokenize()
  2. Clean the words by removing the punctuation marks
  3. Store the last word of each sentence of the poem in a list
  4. Generate another list using cmudict.entries() with elements as those last words and their pronunciation.

I am stuck with the next step. How should I try to match those pronunciations? In all, my major task is to find out if two given words rhyme or not. If rhyme, then return True, else False.

Upvotes: 22

Views: 18715

Answers (4)

Bryn Beaudry
Bryn Beaudry

Reputation: 26

A little bit late to the party, but I was making a project to find many many rhyming lines for processing. I started in this question thread, and was sparked by kender 's accepted answer. Just a good thing to know that when identifying thousands of rhymes, you can leverage dictionary lookup to greatly improve the speed.

This is also updated for python 3.x.

Store the CMU pronouncing dictionary as a json file with an init function and a function that converts the entries to from tuple to dict.

# global
json_entries = None

def tup2dict(tup, di):
for a, b in tup:
    di.setdefault(a, []).append(b)
return di

def init_cmu(args):
    import nltk
    nltk.download('cmudict')
    nltk.corpus.cmudict.ensure_loaded()
    cmu_entries = nltk.corpus.cmudict.entries()
    cmu_dict = dict()
    tup2dict(cmu_entries, cmu_dict)
    with open('./maps/cmu.json', 'w') as convert_file:
        convert_file.write(json.dumps(cmu_dict))

If you use a function in your script than needs to determine rhymes, you can call this function at the top of it to ensure you have access to the dictionary.

def require_rhyme_dict():
    global json_entries
    if json_entries:
        return
    try:
        jsonf = open('./maps/cmu.json', 'r')
    except:
        pass
    else:
        # Global
        json_entries = dict(json.load(jsonf))
        jsonf.close()
        print('json_entries loaded.')

Finally, here is a modified version of kender's rhyme function using the dictionary and a different method of sub-word checking (I hit some edge case were the rhyme wasn't 'lame', but the words weren't allowed to rhyme based on the index of the beginning of a word, and the length. Can't recall the example exactly, it was rare).

When calling thousands of times in a row, the following function call is much faster than iterating through all the nltk cmu entries to find the list of pronunciation syllables once a word is found. The level, as before, refers to how many Pronunciation syllables between the words, matching those found at the end, should match in order to constitute a rhyme. In this example, if there are multiple pronunciations, those are also checked, for each word.

def isContainSameWord(word1, word2):
    if word1 in word2 or word2 in word1:
        return True
    else:
        return False

def isRhyme(word1, word2, level):
    require_rhyme_dict()
    global json_entries
    if isContainSameWord(word1, word2):
        return False
    word1_syllable_arrs = json_entries.get(word1)
    word2_syllables_arrs = json_entries.get(word2)
    if not word1_syllable_arrs or not word2_syllables_arrs:
        return False
    for a in word1_syllable_arrs:
        for b in word2_syllables_arrs:
            if a[-level:] == b[-level:]:
                return True
    return False

Upvotes: 1

kender
kender

Reputation: 87201

Here I found a way to find rhymes to a given word using NLTK:

def rhyme(inp, level):
     entries = nltk.corpus.cmudict.entries()
     syllables = [(word, syl) for word, syl in entries if word == inp]
     rhymes = []
     for (word, syllable) in syllables:
             rhymes += [word for word, pron in entries if pron[-level:] == syllable[-level:]]
     return set(rhymes)

where inp is a word and level means how good the rhyme should be.

So you could use this function and to check if 2 words rhyme you could just check if one is in other's set of allowed rhymes:

def doTheyRhyme(word1, word2):
    # first, we don't want to report 'glue' and 'unglue' as rhyming words
    # those kind of rhymes are LAME
    if word1.find(word2) == len(word1) - len(word2):
        return False
    if word2.find(word1) == len(word2) - len(word1): 
        return False

    return word1 in rhyme(word2, 1)

Upvotes: 14

JeffThompson
JeffThompson

Reputation: 1600

The Pronouncing library does a great job for that. No hacking, quick to load, and is based on the CMU Pronouncing Dictionary so it's reliable.

https://pypi.python.org/pypi/pronouncing

From their documentation:

>>> import pronouncing
>>> pronouncing.rhymes("climbing")
['diming', 'liming', 'priming', 'rhyming', 'timing']

Upvotes: 21

Christian Alis
Christian Alis

Reputation: 6806

Use soundex or double metaphone to find out if they rhyme. NLTK doesn't seem to implement these but a quick Google search showed some implementations.

Upvotes: 2

Related Questions