Reputation: 415
I am fairly new to python. I am not able to figure out the bug. I want to extract nouns using NLTK.
I have written the following code:
import nltk
sentence = "At eight o'clock on Thursday film morning word line test best beautiful Ram Aaron design"
tokens = nltk.word_tokenize(sentence)
tagged = nltk.pos_tag(tokens)
length = len(tagged) - 1
a = list()
for i in (0,length):
log = (tagged[i][1][0] == 'N')
if log == True:
a.append(tagged[i][0])
When I run this, 'a' only has one element
a
['detail']
I do not understand why?
When I do it without for loop, that is running
log = (tagged[i][1][0] == 'N')
if log == True:
a.append(tagged[i][0])
by change value of 'i' manually from 0 to 'length', i get the output perfectly, but with for loop it only returns the end element. Can someone tell me what is wrong happening with for loop.
'a' should be as follows after the code
['Thursday', 'film', 'morning', 'word', 'line', 'test', 'Ram' 'Aaron', 'design']
Upvotes: 4
Views: 13519
Reputation: 303
Try This
import nltk
sentence = "At eight o'clock on Thursday film morning word line test best beautiful Ram Aaron design"
tokens = nltk.word_tokenize(sentence)
tagged = nltk.pos_tag(tokens)
length = len(tagged) - 1
a = list()
for i in range(0, length):
log = (tagged [i][1][0] == 'N')
if log == True:
a.append(tagged [i][0])
print a
Upvotes: 0
Reputation: 122012
>>> from nltk import word_tokenize, pos_tag
>>> sentence = "At eight o'clock on Thursday film morning word line test best beautiful Ram Aaron design"
>>> nouns = [token for token, pos in pos_tag(word_tokenize(sentence)) if pos.startswith('N')]
>>> nouns
['Thursday', 'film', 'morning', 'word', 'line', 'test', 'Ram', 'Aaron', 'design']
Upvotes: 10
Reputation: 117856
This line will only loop twice
for i in (0,length):
Once with i = 0
and once with i = length
What you want is
for i in range(length):
Upvotes: 0
Reputation: 76194
for i in (0,length):
This iterates over two elements, zero and length
. If you want to iterate over every number between zero and length
, use range
.
for i in range(0, length):
Better yet, it's more idiomatic to directly iterate over the elements of a sequence, rather than its index. This will reduce the likelihood of typos like the one above.
for item in tagged:
if item[1][0] == 'N':
a.append(item[0])
Size-conscious users may even prefer the one line list comprehension:
a = [item[0] for item in tagged if item[1][0] == 'N']
Upvotes: 8