Reputation: 19
I am doing a keyphrase classification task and for this i am working with the head noun extraction from keyphrases in python. The little help available on internet is not of good use. i am struggling with this.
Upvotes: 1
Views: 2830
Reputation: 39860
This task is known as Part-of-Speech tagging and falls within the field of Natural Language Processing (NLP). In order to extract nouns from a text you can either use nltk
import nltk
text= 'Your text goes here'
# Check if noun (=NN)
isNoun = lambda pos: pos[:2] == 'NN'
# tokenise text and keep only nouns
tokenized = nltk.word_tokenize(lines)
nouns = [word for (word, pos) in nltk.pos_tag(tokenized) if isNoun (pos)]
print(nouns)
or TextBlob
from textblob import TextBlob
text= 'Your text goes here'
blob = TextBlob(text)
print(blob.noun_phrases)
If you want to learn more about PoS tagging, you may find this post from official's nltk
page very useful.
Upvotes: 0
Reputation: 558
You can use Stanford Parser package in NLTK and get dependency relations; then use the relations work for you, such as nn or compound (noun compound modifier). You can take a look at De Marneffe's typed dependencies manual here.
In the manual, the noun phrase of "oil price futures" contains compounds having two modifiers and a head.
You can check any sentence's parse trees and dependencies from Stanford Parser demo interface here.
Hope this helps,
Cheers
Upvotes: 1
Reputation: 4607
You can use Parts of speech tagging to sentence by using NLTK toolkit package and extract the tags associated with either "Nouns" , "Verbs" also
text = '''I am doing a keyphrase classification task and for this i am working with the head noun extraction from keyphrases in python. The little help available on internet is not of good use. i am struggling with this.'''
pos_tagged_sent = nltk.pos_tag(nltk.tokenize.word_tokenize(text))
nouns = [tag[0] for tag in pos_tagged_sent if tag[1]=='NN']
Out:
[('I', 'PRP'),
('am', 'VBP'),
('doing', 'VBG'),
('a', 'DT'),
('keyphrase', 'NN'),
('classification', 'NN'),
Upvotes: 0