aks
aks

Reputation: 31

Number of syllables for words in a text

I have the following code excerpt to find the number of syllables for all the words in the given input text 'sample.txt' using NLTK :

   import re
   import nltk
   from curses.ascii import isdigit
   from nltk.corpus import cmudict
   import nltk.data
   import pprint

   d = cmudict.dict()

   tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
   fp = open("sample.txt")
   data = fp.read()
   tokens = nltk.wordpunct_tokenize(data)
   text = nltk.Text(tokens)
   words = [w.lower() for w in text]
   print words #to print all the words in input text
   regexp = "[A-Za-z]+"
   exp = re.compile(regexp)

   def nsyl(word):
      return max([len([y for y in x if isdigit(y[-1])]) for x in d[word]])

  sum1 = 0
  count = 0
  count1 = 0
  for a in words:
     if exp.match(a)):
         print a
         print "no of syllables:",nysl(a)
         sum1 = sum1 + nysl(a)
         print "sum of syllables:",sum1
         if nysl(a)<3:
             count = count + 1
         else:
             count1 = count1 + 1

  print "no of words with syll count less than 3:",count
  print "no of complex words:",count1

This code will match each input word with the cmu dictionary and give the number of syllables for the word. But it fails to work and displays an error incase the word is not found in the dictionary or I use a proper noun in the input. I want to check if the word exists in the dictinary and if it doesn't, skip it and continue and consider the next word. How do I do this?

Upvotes: 3

Views: 5308

Answers (1)

I82Much
I82Much

Reputation: 27326

I'm guessing the problem is a key error. Replace your definition with

def nsyl(word):
  lowercase = word.lowercase()
  if lowercase not in d:
     return -1
  else:
     return max([len([y for y in x if isdigit(y[-1])]) for x in d[lowercase]])

Conversely, you can check to see if the word is in the dictionary first before calling nsyl, and then you don't have to worry about that within the nsyl method itself.

Upvotes: 3

Related Questions