Reputation: 1233
I tried to generate synthetic data with Border line SMOTE in imblearn library but no synthetic data was generated. I am working with a multiclass based dataset, for purposes of generating data I split my dataframe into the minority class and majority class such like binary classification. Then I put the features to X and the target class consisting of 1's and 0's to y. This method worked with SVMSMOTE,SMOTENC in imblearn library but doesn't work with BorderlineSMOTE.
X=df.drop(['target'], axis=1)
y=df['target']
border_line = BorderlineSMOTE(random_state=42)
X_res, y_res = border_line.fit_resample(X, y)
The code doesnot provide an error but X_res
contains the same records as X
, with no synthetic data added.
Is the BorderlineSMOTE module deprecated in imblearn library?
https://imbalanced-learn.org/stable/references/generated/imblearn.over_sampling.BorderlineSMOTE.html
Upvotes: 0
Views: 152
Reputation: 174
According to the original implementation, BorderlineSMOTE only oversamples a "specific type" of data points that meet the criteria. It oversamples the minority points that are close to the decision boundaries, so it needs some information regarding the class target that the examples belong to (https://miriamspsantos.github.io/pdf-files/IEEE-CIM-Version.pdf). Are you giving the correct input?
Also, due to the lower variability of the examples generated with SMOTE-like methods, I'd also explore other solutions for synthetic data generation (ydata-synthetic is a nice starter for this, for instance).
Upvotes: 1