zaki41
zaki41

Reputation: 81

How to build separate classifiers for each label in the dataset?

I have a list of columns and each column is to be labelled by a label from another list of labels. Eg: Two columns namely, ALT_ID and MTRC_NM are matched with labels Alternate ID and Metric Name respectively.

This fuzzy string matching has been taken care of. Problem is, I want to incorporate a learning model in this.

Essentially, after the matched results are displayed, the user curates the matches as CORRECT or INCORRECT. Based on this feedback and other features of the column (like minimum value, maximum value), I want to train a classifier such that the learning model will eventually stop making the incorrect matches in the future.

Note: In the first run, only the name of the column is used to produce the first set of results. After this, I want to use other features(like minimum value) to train the model.

Problem is, there can be 10,000 terms (or labels), maybe even more and the user just marks these as CORRECT or INCORRECT. For incorrect classifications, the user does not tell us what the correct classification should be.

I believe one solution could be to make separate classifiers for each label and based on the Correct/Incorrect feedback for a particular classification, we can use these feature vectors to train a classifier for this classification. So in the future, if the fuzzy string matching nominates Metric Name as the classification for some column, we can let the "Metric Name" classifier decide if it is correct or incorrect.

I don't know how to make separate classifiers for each label. I also don't know if this approach is feasible. Any other solution to this problem will also help.

Upvotes: 1

Views: 187

Answers (1)

YuseqYaseq
YuseqYaseq

Reputation: 283

You do not want to create separate models for each label as training more than 10 000 models isn't really feasible. Two possible things that come to my mind are:

  1. Create a supervised learning model with one label as input and probability of each of 10 000 labels as output which only uses correct examples for predictions.
  2. Create a reinforcement learning model with the same input but with output which maximises reward function defined as +1 for each positive prediction and -1 for each negative prediction. This model will also try to maximise the number of correct predictions but will be able to learn from incorrect predictions at the same time i.e. predict -1 score for an incorrect pair (x,y).

Upvotes: 1

Related Questions