Mubashar Nazar
Mubashar Nazar

Reputation: 19

how to extract head nouns from a phrase in python?

I am doing a keyphrase classification task and for this i am working with the head noun extraction from keyphrases in python. The little help available on internet is not of good use. i am struggling with this.

Upvotes: 1

Views: 2830

Answers (3)

Giorgos Myrianthous
Giorgos Myrianthous

Reputation: 39860

This task is known as Part-of-Speech tagging and falls within the field of Natural Language Processing (NLP). In order to extract nouns from a text you can either use nltk

import nltk

text= 'Your text goes here'

# Check if noun (=NN)
isNoun = lambda pos: pos[:2] == 'NN'

# tokenise text and keep only nouns
tokenized = nltk.word_tokenize(lines)
nouns = [word for (word, pos) in nltk.pos_tag(tokenized) if isNoun (pos)] 
print(nouns)

or TextBlob

from textblob import TextBlob
text= 'Your text goes here'
blob = TextBlob(text)
print(blob.noun_phrases)

If you want to learn more about PoS tagging, you may find this post from official's nltk page very useful.

Upvotes: 0

berkin
berkin

Reputation: 558

You can use Stanford Parser package in NLTK and get dependency relations; then use the relations work for you, such as nn or compound (noun compound modifier). You can take a look at De Marneffe's typed dependencies manual here.

In the manual, the noun phrase of "oil price futures" contains compounds having two modifiers and a head.

You can check any sentence's parse trees and dependencies from Stanford Parser demo interface here.

Hope this helps,

Cheers

Upvotes: 1

Naga kiran
Naga kiran

Reputation: 4607

You can use Parts of speech tagging to sentence by using NLTK toolkit package and extract the tags associated with either "Nouns" , "Verbs" also

text = '''I am doing a keyphrase classification task and for this i am working with the head noun extraction from keyphrases in python. The little help available on internet is not of good use. i am struggling with this.'''
pos_tagged_sent = nltk.pos_tag(nltk.tokenize.word_tokenize(text))

nouns = [tag[0] for tag in pos_tagged_sent if tag[1]=='NN']

Out:

[('I', 'PRP'),
 ('am', 'VBP'),
 ('doing', 'VBG'),
 ('a', 'DT'),
 ('keyphrase', 'NN'),
 ('classification', 'NN'),

Upvotes: 0

Related Questions