elspanishgeek
elspanishgeek

Reputation: 41

Aspect Extraction on spaCy

[Disclaimer: I also posted this question to spacy's Github discussions]

I’m trying to avoid some common pitfalls, so looking for any guidance on the best approach to perform aspect extraction given a taxonomy on spaCy. I’ve considered two approaches:


1) NER + DependencyMatcher

Entity Recognition: Make sure to have a rich set of entities for each taxonomy category. For example, for the SERVICE category, entities such as: chef, cook, waiter, waitress, waitstaff, server, bartender, barman, barwoman, etc.

Dependency Pattern Matching: Create multiple DependencyMatcher patterns to capture phrases grounded on the entities like:

Example Pattern:

"ADJ_AUX_SERVICE": [
    {"RIGHT_ID": "adj", "RIGHT_ATTRS": {"POS": "ADJ", "DEP": "acomp"}},
    {"LEFT_ID": "adj", "REL_OP": "<", "RIGHT_ID": "aux", "RIGHT_ATTRS": {"POS": "AUX"}},
    {"LEFT_ID": "aux", "REL_OP": ">", "RIGHT_ID": "service", "RIGHT_ATTRS": {"ENT_TYPE": "SERVICE", "DEP": "nsubj"}}
],

I can also expand the matched tokens to include the subtrees they belong to or add modifiers (like an "advmod" on the adjective) to capture intensifiers such as “very” or “kinda.”

(Optional) Classification: Use a multiclass TextCat on the extracted patterns to apply a final category label or none and use that as a filtering for the extractions that are not useful.

Sentiment Scoring: Running a separate sentiment classification model on the matched phrases.


2) TextCat/SpanCat

Data Preparation: Keep documents short (from a sentence up to a few sentences) and train a multilabel TextCat or SpanCat model based on the available annotations.

For this approach, I do have some extra questions:

I’m currently using the en_core_web_trf pipeline, which includes a Tagger, Parser, Lemmatizer, and NER all sharing the same transformer.

If I add a TextCat or SpanCat, should I:

If I opt for the transformer approach, should I:

If added as a listener, my understanding is that there are two main options:

Sentiment Scoring:


3) NER/SpanCat + Relation

If I'm reading this correctly it looks like this could be a viable approach if this NER is treated as a SpanCat for the "aspect" and then the relation label allows for more nuance extraction. Continuing my above example and making it more complex:

                 /----------------------------------FOOD QUALITY----------------------v      
                /--------SERVICE QUALITY--------v                                     v
"Restaurant [ABC:ENTITY] has [very rude staff:ASPECT] but their [pizzas are an out of body experience:ASPECT]"

Sentiment Scoring: Following this model with a TextCat for the sentiment classification on the aspect-labeled entities/spans.

Upvotes: 0

Views: 30

Answers (0)

Related Questions