Reputation: 5291
I am new to NLP and I am looking for a starting point, in terms of some tutorials, documentation or example code. I have been told to research the possibilities of processing natural text to extract some structured data from it. For example I want to extract(annotate) height and weight from following statements. "He is 6 feet tall and weighs 200 pounds" or "His height is 6 feet and weight is 200" etc. I have looked into UIMA but it seems like a self created REGEX dictionary with no training capabilities. So in a nutshell, what Java framework can I use to create an annotation engine that can be trained as well! Any help(pointers) on this will be heavily appreciated. Thanks
Upvotes: 2
Views: 1199
Reputation: 6039
I'd use NER. Here is the output I see for your input text:
You can try it here: http://deagol.cs.illinois.edu:8080
Upvotes: 0
Reputation: 15931
If you really want to want to use machine learning to train your annotator, then GATE is probably your best bet. Take a look at the chapter on machine learning in their guide.
Upvotes: 3
Reputation: 2579
Since you asked for pointers: LingPipe (already mentioned above), OpenNLP, and Stanford NLP distributions.
Note: if Python is an option, you can use the Natural Language Toolkit.
Upvotes: 5