Reputation: 813
I want to tag a field specific such as technical and scientific nouns in a sentence using Part-of-speech technique .
Example
Consider the sentences:
1) Computers need keyboard , moniter , CPU to work.
2) Automobile uses gears and clutch .
Now my objective is , the example sentences have to be tagged as
Computer/technical
need/noun
keyboard/technical
CPU / technical
to /preposition
work /verb
Automobile / mechanical
uses / verb
gears / mechanical
and / conjunction
clutch / mechanical
My Previous Works
I already used Stanford NLP , Open NLP , but they are tagging POS , but not satisfying what is need.
Please tell me how to do this ?
Upvotes: 0
Views: 197
Reputation: 8366
Named entity recognition (NER) is an entity identification/extraction system that locates entities in text and classifies them into predefined categories (e.g. motherboard --> technical, RAM --> technical random access memory --> technical). NERs typically use linguistic grammar-based methods and statistical methods. I doubt you will need to get into the details of these methods for your task. If you do get interested, feel free to read up on conditional random fields.
As far as I can see, all you need is to be able to train your own NER with your categories (i.e. technical, mechanical, etc.). The Stanford NER FAQ page provides adequate information on how to do this.
For an intuitive understanding of how the final system will work, you can take a look at the online demo of the Stanford NER. They provide English, Chinese and German classifiers. There are three English classifiers that were trained on 3, 4 and 7 categories ... try them out, and see for yourself.
I've tried to be as succinct as possible. A detailed introduction to NER is not possible on SO. I hope my answer, together with the links provided, helps your task.
Upvotes: 1
Reputation: 3953
Interesting problem, here are a few thoughts. Since you need the parts of speech, use a part of speech tagger such as OpenNLP, this will give you the POS tags you need. The second part is a bit trickier (classifying certain words). If the words that map to a category will be limited, you could simply use a lookup list, sometimes this is the simplest and most accurate, using an NER model will give you some noise. If not, then you can do what was already suggested, with is to train an NER model.
Upvotes: 1