I need to classify questions asking user to specify brand. I has some set of samples featuring word "brand". Positives like: "What is your favorite cosmetic brand?", "Which fragrance brand (if any) do you think this advert is for?"... and negatives like: "Is there any particular reason why you chose this brand?" Of cause, it's possible to train 2-class classifier based on concrete samples. However precision and recall will be poor. Is there any way to construct something having good precision based on variety of positive samples?

machine-learningnlptext-classification

user1439579

Reputation: 131

Classifier or heuristics?

I need to classify questions asking user to specify brand. I has some set of samples featuring word "brand".

Positives like:

"What is your favorite cosmetic brand?",
"Which fragrance brand (if any) do you think this advert is for?"...

and negatives like:

"Is there any particular reason why you chose this brand?"

Of cause, it's possible to train 2-class classifier based on concrete samples. However precision and recall will be poor. Is there any way to construct something having good precision based on variety of positive samples?

Upvotes: 2

Answers (2)

rpd

Reputation: 482

Choosing a set of words as features using tf-idf and training a tree algorithm seems the easiest way to go but I would also suggest to also try k-means clustering in the case that noe or more categories of answers considered as "neutral" emerge. This will possible help you decide which of these you consider positive or negative in order to re-factor your feature vector and subsequently your algorithm.

I am also a huge fan of HMM variants (I have used them to perform energy disaggregation) and I suggest you have a look at the following. It might give you some extra ideas:

http://www.merl.com/publications/docs/TR2004-085.pdf

Upvotes: 0

Oded

Reputation: 41

Precision and recall does not have to be poor. You should try and build a binary classifier (I would recommend SVM or decision tree for this purpose). I would recommend extracting features like the number of occurrences of each word in a sample (or tf-idf) or the length of the words and sentences. I guess that the question word in the sentence will have a major impact on the classification.

In addition, please note that a good precision value is very easy to get when you do not care about recall.

Upvotes: 1

Classifier or heuristics?

Answers (2)

Related Questions