Number of syllables for words in a text

Question

I have the following code excerpt to find the number of syllables for all the words in the given input text 'sample.txt' using NLTK :

   import re
   import nltk
   from curses.ascii import isdigit
   from nltk.corpus import cmudict
   import nltk.data
   import pprint

   d = cmudict.dict()

   tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
   fp = open("sample.txt")
   data = fp.read()
   tokens = nltk.wordpunct_tokenize(data)
   text = nltk.Text(tokens)
   words = [w.lower() for w in text]
   print words #to print all the words in input text
   regexp = "[A-Za-z]+"
   exp = re.compile(regexp)

   def nsyl(word):
      return max([len([y for y in x if isdigit(y[-1])]) for x in d[word]])

  sum1 = 0
  count = 0
  count1 = 0
  for a in words:
     if exp.match(a)):
         print a
         print "no of syllables:",nysl(a)
         sum1 = sum1 + nysl(a)
         print "sum of syllables:",sum1
         if nysl(a)<3:
             count = count + 1
         else:
             count1 = count1 + 1

  print "no of words with syll count less than 3:",count
  print "no of complex words:",count1

This code will match each input word with the cmu dictionary and give the number of syllables for the word. But it fails to work and displays an error incase the word is not found in the dictionary or I use a proper noun in the input. I want to check if the word exists in the dictinary and if it doesn't, skip it and continue and consider the next word. How do I do this?

Number of syllables for words in a text

Answers (1)

Related Questions