chunks using python

Question

I'm trying to extract all the proper nouns from a tagged paragraph. What I did in my code is that first I've extracted the paragraph separately and then I have checked whether there is any proper noun in it. But the problem is, I haven't been able to extract the proper noun. My code doesn't even go inside the loop where it checks for a specific tag.

My code:

def noun(sen):
m=[]
if (sen.split('/')[1].lower().startswith('np')&sen.split('/')[1].lower().endswith('np')):
         w=sen.strip().split('/')[0]
         m.append(w)
return m


import nltk
rp = open("tesu.txt", 'r')
text = rp.read()
list = []
sentences = splitParagraph(text)
for s in sentences:
 list.append(s)

Sample input from 'tesu.txt'

Several/ap defendants/nns in/in the/at Summerdale/np police/nn burglary/nn trial/nn      made/vbd statements/nns indicating/vbg their/pp$ guilt/nn at/in the/at.... 

Bellows/np made/vbd the/at disclosure/nn when/wrb he/pps asked/vbd Judge/nn-tl Parsons/np to/to grant/vb his/pp$ client/nn ,/, Alan/np Clements/np ,/, 30/cd ,/, a/at separate/jj trial/nn ./.

How can I extract all the tagged proper nouns from a paragraph?

DNA · Accepted Answer

Thanks for the data sample.

You need to:

read each paragraph/line
split the line by whitespace to extract each tagged word, e.g. Summerdale/np
split the word by / to see if it is tagged np
if so, add the other half of the split (the actual word) to your noun list

So something like the following (based on Bogdan's answer, thanks!)

def noun(word):
    nouns = []
    for word in sentence.split():
      word, tag = word.split('/')
      if (tag.lower() == 'np'):
        nouns.append(word);
    return nouns

if __name__ == '__main__':
    nouns = []
    with open('tesu.txt', 'r') as file_p:
         for sentence in file_p.read().split('

'): 
              result = noun(sentence)
              if result:
                   nouns.extend(result)
    print nouns

which for your example data, produces:

['Summerdale', 'Bellows', 'Parsons', 'Alan', 'Clements']

Update: In fact, you can shorten the whole thing down to this:

nouns = []
with open('tesu.txt', 'r') as file_p:
  for word in file_p.read().split(): 
    word, tag = word.split('/')
    if (tag.lower() == 'np'):
      nouns.append(word)
print nouns

if you don't care which paragraph the nouns come from.

You could also get rid of the .lower() if the tags are always lowercase as they are in your example.

chunks using python

Answers (2)

Related Questions