Sankha Sumadhura
Sankha Sumadhura

Reputation: 95

How to extract sub topic sentences of a review using python & NLTK?

Is there any efficient way to extract sub topic explanations of a review using python and NLTK library.As an example an user review regarding mobile phone could be "This phone's battery is good but display is a bullshit" I wanna extract above two features like

"Battery is good"
"display is a bullshit"

The purpose of above is em gonna develop a rating system for products with respect to features of the product. Analyzing polarity part has done. But extracting features of review is some difficult for me.But I found a way to extract features using POS tag patterns with regular expressions like

<NN.?><VB.?>?<JJ.?> 

this pattern as sub topic.But the problem is there could be lots of patterns in a review according to users description patterns.

Is there any way to solve my problem efficiently??? Thank you !!

Upvotes: 1

Views: 643

Answers (1)

sophros
sophros

Reputation: 16660

The question you posed is multi-faceted and not straightforward to answer.

Conceptually, you may want to go through the following steps:

  1. Identify the names of the features of phones (+ maybe creating an ontology based on these features).

  2. Create a lists of synonyms to the feature names (similarly for evaluative phrases, e.g. nice, bad, sucks, etc.).

  3. Use one of NLTK taggers to parse the reviews.

  4. Create rules for extraction of features and their evaluation (Information Extraction part). I am not sure if NLTK can directly support you with this.

  5. Evaluate and refine the approach.

Or: create a larger annotated corpus and train a Deep learning model on it using TensorFlow, Theano, or anything else alike.

Upvotes: 1

Related Questions