apetros85
apetros85

Reputation: 461

Is duplicating data a valid way to fix bias?

I’m reading a paper in the area of engineering. They have a labelled dataset which is biased. There are many more instances labelled A than B. They want to train a classifier to predict the A or B label based on some inputs (states).

The authors say:

To artificially remedy this problem, random replicas of the B states are incorporated into the dataset to even out the lot.

I don’t know much on data analytics, but this doesn’t sound very valid to me. Is it?

Upvotes: 2

Views: 42

Answers (1)

venkatadileep
venkatadileep

Reputation: 183

This type of data normally called as imbalanced data. what author said was right to deal with imbalanced data we need to add some duplication to bring as a balanced(but instead of adding randomly will see the data patterns and add the data). there many algorithms methods to deal with imbalance classification just go through this it might help you https://datascience.stackexchange.com/questions/24392/why-we-need-to-handle-data-imbalance

Upvotes: 1

Related Questions