Dilanka M
Dilanka M

Reputation: 382

How to extract/identify word or text from the given text using stanford-nlp or OpenNLP via java

I am about to extract some information from the RAW tests published in Social Media, News sites, blogs which are related to a specific field such as politics, WAR, DRUGS etc. So I am already started to use some open source libs such as stanford-nlp, apache OpenNLP as well as a commercial licensed tool called lexalytics.

According to my project, we are analyzing text posted in publicly and generating some results and doing mining based on some parameters to identify those post are related to what category.

But I need to extract topics from the given text using stanford-nlp library. Topic means text or sentences related to EDUCATION, POLITICS such that. Already I am able to extract entities like text/sentences contain LOCATION, DATE, PERSON, MONEY such a way.

Same topic extraction coming with lexalytics as well which is licenses tool.

your help appreciated.

Thanks.

Upvotes: 0

Views: 974

Answers (1)

kavin
kavin

Reputation: 96

Topic extraction from text documents can be done using generative modeling where words distributed are assumed a prior based on topic(s) for which algorithms like LDA are deployed

Open Calais API http://www.opencalais.com/opencalais-api/ gives you single/multiple topics with associated confidence values on which the document was based.

Upvotes: 0

Related Questions