ReInvent_IO
ReInvent_IO

Reputation: 477

How to oversample image dataset using Python?

I am working on a multiclass classification problem with an unbalanced dataset of images(different class). I tried imblearn library, but it is not working on the image dataset.

I have a dataset of images belonging to 3 class namely A,B,C. A has 1000 data, B has 300 and C has 100. I want to oversample class B and C, so that I can avoid data imbalance. Please let me know how to oversample the image dataset using python.

Upvotes: 4

Views: 5260

Answers (1)

BenyaminGhN
BenyaminGhN

Reputation: 35

Actually, it seems imblearn.over_sampling resamples just 2d dims inputs. So one way to oversample your image dataset by this library is to use reshaping alongside it, you can:

  • reshape your images
  • oversample them
  • again reshape the new dataset to the first dims

consider you have an image dataset of size (5000, 28, 28, 3) and dtype of nd.array, following the above instructions, you can use the solution below:

# X : current_dataset
# y : labels

from imblearn.over_sampling import RandomOverSampler
reshaped_X = X.reshape(X.shape[0],-1)

#oversampling
oversample = RandomOverSampler()
oversampled_X, oversampled_y  = oversample.fit_resample(reshaped_X , y)

# reshaping X back to the first dims
new_X = oversampled_X.reshape(-1,28,28,3)

hope that was helpful!

Upvotes: 1

Related Questions