elric
elric

Reputation: 63

Extracting Product Attribute/Features from text

I've been assigned a task to extract features/attributes from product description.

Levi Strauss slim fit jeans
Big shopping bag in pink and gold

I need to be able to extract out attributes such as "Jeans" and "slim fit" or "shopping bag" and "pink" and "gold". The product description listings are not just for clothes, they can basically be anything.

I am not sure how to approach this problem. I tried implementing a Named Entity Recognizer solution and also a POS implementations, The NER implementation fails to recognize any token and most of the tokens show up as NNP(Proper Nouns) in he POS solution, which doesn't help me out a lot. I need a way to be able to distinguish between the brand name and the features of the Product(like if it is a t-shirt, the color or design(round neck, v-neck) etc).

I did implement a KMean solution which did cluster like products together, but then again it is not the result I am looking for.

Just looking for someone to direct me in the correct direction.

Upvotes: 5

Views: 2175

Answers (2)

Gladuo
Gladuo

Reputation: 77

There are some papers talking about this topic, like OpenTag considered it as a sequence labeling problem.

Upvotes: 1

YACINE GACI
YACINE GACI

Reputation: 145

You can take advantage of the recent advances in the aspect based sentiment analysis (ABSA). There is a branch in this domain that extracts aspect and opinion terms. An aspect is like an attribute (a feature of the entity/product/service). The opinion is the value of that attribute. For example, in this sentence: "This restaurant serves good food and has a friendly staff", there are two aspect/opinion pairs: {("Food", "Good"), ("Staff", "Friendly")}

ABSA is traditionally used in user reviews, but who knows? It can work just as well here. The only thing you need to understand is that all the solutions in ABSA are domain-specific, meaning you train your model for one specific domain, "clothing" in your case. Generalizing to multi domain has shown poor results ans is still a matter of active research.

I hope it helps

Upvotes: 2

Related Questions