Reputation: 156
I'm trying to extract all the proper nouns from a tagged paragraph. What I did in my code is that first I've extracted the paragraph separately and then I have checked whether there is any proper noun in it. But the problem is, I haven't been able to extract the proper noun. My code doesn't even go inside the loop where it checks for a specific tag.
My code:
def noun(sen):
m=[]
if (sen.split('/')[1].lower().startswith('np')&sen.split('/')[1].lower().endswith('np')):
w=sen.strip().split('/')[0]
m.append(w)
return m
import nltk
rp = open("tesu.txt", 'r')
text = rp.read()
list = []
sentences = splitParagraph(text)
for s in sentences:
list.append(s)
Sample input from 'tesu.txt'
Several/ap defendants/nns in/in the/at Summerdale/np police/nn burglary/nn trial/nn made/vbd statements/nns indicating/vbg their/pp$ guilt/nn at/in the/at....
Bellows/np made/vbd the/at disclosure/nn when/wrb he/pps asked/vbd Judge/nn-tl Parsons/np to/to grant/vb his/pp$ client/nn ,/, Alan/np Clements/np ,/, 30/cd ,/, a/at separate/jj trial/nn ./.
How can I extract all the tagged proper nouns from a paragraph?
Upvotes: 0
Views: 451
Reputation: 42617
Thanks for the data sample.
You need to:
Summerdale/np
/
to see if it is tagged np
So something like the following (based on Bogdan's answer, thanks!)
def noun(word):
nouns = []
for word in sentence.split():
word, tag = word.split('/')
if (tag.lower() == 'np'):
nouns.append(word);
return nouns
if __name__ == '__main__':
nouns = []
with open('tesu.txt', 'r') as file_p:
for sentence in file_p.read().split('\n\n'):
result = noun(sentence)
if result:
nouns.extend(result)
print nouns
which for your example data, produces:
['Summerdale', 'Bellows', 'Parsons', 'Alan', 'Clements']
Update: In fact, you can shorten the whole thing down to this:
nouns = []
with open('tesu.txt', 'r') as file_p:
for word in file_p.read().split():
word, tag = word.split('/')
if (tag.lower() == 'np'):
nouns.append(word)
print nouns
if you don't care which paragraph the nouns come from.
You could also get rid of the .lower()
if the tags are always lowercase as they are in your example.
Upvotes: 1
Reputation: 8246
You should work on your code style. There are a lot of unnecessary loops in there I think. You also have a unnecessary method in splitParagraph
that basically only calls the already existing split
method, and you import re
but never use it afterwards. Also ident you code, it's very hard to follow this way. You should provide a sample of the input from "tesu.txt"
so we can help you more. Anyway all of your code there could be compact into:
def noun(sentence);
word, tag = sentence.split('/')
if (tag.lower().startswith('np') and tag.lower().endswith('np')):
return word
return False
if __name__ == '__main__'
words = []
with open('tesu.txt', 'r') as file_p:
for sentence in file_p.read().split('\n\n'):
result = noun(sentence)
if result:
words.append(result)
Upvotes: 0