Dulangi_Kanchana
Dulangi_Kanchana

Reputation: 1233

imblearn library BorderlineSMOTE module does not generate any synthetic data

I tried to generate synthetic data with Border line SMOTE in imblearn library but no synthetic data was generated. I am working with a multiclass based dataset, for purposes of generating data I split my dataframe into the minority class and majority class such like binary classification. Then I put the features to X and the target class consisting of 1's and 0's to y. This method worked with SVMSMOTE,SMOTENC in imblearn library but doesn't work with BorderlineSMOTE.

X=df.drop(['target'], axis=1)
y=df['target']

border_line = BorderlineSMOTE(random_state=42)
X_res, y_res = border_line.fit_resample(X, y)

The code doesnot provide an error but X_res contains the same records as X, with no synthetic data added.

Is the BorderlineSMOTE module deprecated in imblearn library?
https://imbalanced-learn.org/stable/references/generated/imblearn.over_sampling.BorderlineSMOTE.html

Upvotes: 0

Views: 152

Answers (1)

SeaEngineering
SeaEngineering

Reputation: 174

According to the original implementation, BorderlineSMOTE only oversamples a "specific type" of data points that meet the criteria. It oversamples the minority points that are close to the decision boundaries, so it needs some information regarding the class target that the examples belong to (https://miriamspsantos.github.io/pdf-files/IEEE-CIM-Version.pdf). Are you giving the correct input?

Also, due to the lower variability of the examples generated with SMOTE-like methods, I'd also explore other solutions for synthetic data generation (ydata-synthetic is a nice starter for this, for instance).

Upvotes: 1

Related Questions