Reputation: 477
I am working on a multiclass classification problem with an unbalanced dataset of images(different class). I tried imblearn
library, but it is not working on the image dataset.
I have a dataset of images belonging to 3 class namely A,B,C. A has 1000 data, B has 300 and C has 100. I want to oversample class B and C, so that I can avoid data imbalance. Please let me know how to oversample the image dataset using python.
Upvotes: 4
Views: 5260
Reputation: 35
Actually, it seems imblearn.over_sampling
resamples just 2d dims inputs. So one way to oversample your image dataset by this library is to use reshaping alongside it, you can:
consider you have an image dataset of size (5000, 28, 28, 3) and dtype of nd.array
, following the above instructions, you can use the solution below:
# X : current_dataset
# y : labels
from imblearn.over_sampling import RandomOverSampler
reshaped_X = X.reshape(X.shape[0],-1)
#oversampling
oversample = RandomOverSampler()
oversampled_X, oversampled_y = oversample.fit_resample(reshaped_X , y)
# reshaping X back to the first dims
new_X = oversampled_X.reshape(-1,28,28,3)
hope that was helpful!
Upvotes: 1