Reputation: 31
I have the following code excerpt to find the number of syllables for all the words in the given input text 'sample.txt' using NLTK :
import re
import nltk
from curses.ascii import isdigit
from nltk.corpus import cmudict
import nltk.data
import pprint
d = cmudict.dict()
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
fp = open("sample.txt")
data = fp.read()
tokens = nltk.wordpunct_tokenize(data)
text = nltk.Text(tokens)
words = [w.lower() for w in text]
print words #to print all the words in input text
regexp = "[A-Za-z]+"
exp = re.compile(regexp)
def nsyl(word):
return max([len([y for y in x if isdigit(y[-1])]) for x in d[word]])
sum1 = 0
count = 0
count1 = 0
for a in words:
if exp.match(a)):
print a
print "no of syllables:",nysl(a)
sum1 = sum1 + nysl(a)
print "sum of syllables:",sum1
if nysl(a)<3:
count = count + 1
else:
count1 = count1 + 1
print "no of words with syll count less than 3:",count
print "no of complex words:",count1
This code will match each input word with the cmu dictionary and give the number of syllables for the word. But it fails to work and displays an error incase the word is not found in the dictionary or I use a proper noun in the input. I want to check if the word exists in the dictinary and if it doesn't, skip it and continue and consider the next word. How do I do this?
Upvotes: 3
Views: 5308
Reputation: 27326
I'm guessing the problem is a key error. Replace your definition with
def nsyl(word):
lowercase = word.lowercase()
if lowercase not in d:
return -1
else:
return max([len([y for y in x if isdigit(y[-1])]) for x in d[lowercase]])
Conversely, you can check to see if the word is in the dictionary first before calling nsyl, and then you don't have to worry about that within the nsyl method itself.
Upvotes: 3