Reputation: 507
I'm developing an aspect-level sentiment analysis project for online travel reviews of travel domain.
I have a human annotated dataset that has labelled aspect terms, and aspect categories along with their sentiment polarity.
For example;
Sentence:
This beach was a wonderful time for a day party it had a fun crowd and has a big bar with a great atmosphere. The food was delicious too.
The above sentence has the following aspect terms labelled;
{party#positive C} {crowd#positive C} {bar#positive C} {food#positive C}
And the following aspect categories;
{entertainment#positive C} {accommodation#positive C}
I want to try a supervised learning
approach to train a model to classify aspect terms from sentences.
I'm using Stanford CORENLP
library. But confused as to how the training data format should be? and what is the best approach to take.
I have seen people using IOB notation
to format training data to train NER
systems. Can I use a similar method to get this done? As in, how do I format my training data file to get aspect terms as mentioned above from an input sentence?
If someone can point me in the right direction, I would appreciate that a lot.
Upvotes: 1
Views: 887
Reputation: 301
This problem can be tackled by breaking it down to smaller subtasks. A possible pipeline approach may be:
The first stage is aspect term extraction which will identify aspect terms in the raw text. This too can be broken down to two subtasks. Firstly your system will need to label tokens in text that are aspect terms. Let's call the these labelled tokens aspect term mentions. This is called Named Entity Recognition (NER). Next, if you have a pre-defined set of aspect term classes, the systems will need to link the aspect term mentions found in the previous task to those classes. This is called Entity Linking. It's worth noting that from the example that you give the labelled dataset is not yet suitable for the above tasks as the labels are not anchored in text. You may be able to create a suitable dataset by guessing which tokens in text do your given labels correspond to. This is similar to the Distant Supervision work.
The next task is aspect term sentiment classification. Convolutional Neural Networks have been used for sentence and document sentiment classification but they can probably be adapted for your purposes if at the input you provide a marker for which tokens are being classified. This is called a position embedding in this work: http://www.cs.nyu.edu/~thien/pubs/vector15.pdf
Upvotes: 3