Nankatsu
Nankatsu

Reputation: 55

OneHotEncoding for categorical data

I have a dataframe like this

        time     label
-----------------------
     morning      good
   afternoon      good
       night       bad
       night      okay

I want to apply onehotencoding for the data to be used in svm crossvalidation. I tried as follows

from sklearn.model_selection import ShuffleSplit
from sklearn.preprocessing import OneHotEncoder
from sklearn.svm import SVC

x = ds_df['time']
y = ds_df['label']

enc = OneHotEncoder()

X_vec = enc.fit_transform(X)

model = SVC(kernel='linear')

cv = ShuffleSplit(n_splits=5, test_size=0.2, random_state=69)
scores = cross_val_score(model, X_vec, y, cv=cv, scoring='precision_weighted')

Then, I got a warning that says

UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))

What should I do? Where did I go wrong?

Upvotes: 0

Views: 47

Answers (1)

skaarfacee
skaarfacee

Reputation: 311

Firstly this is just an warning and not an error.Some labels don' appear in the predicted samples. This means that the accuracy calculated for those labels are set to 0.0

As I mentioned, this is a warning, which is treated differently from an error in python. The default behavior in most environments is to show a specific warning only once. This behavior can be changed:

import warnings
warnings.filterwarnings('ignore')  # "error", "ignore", "always", "default", "module" or "once"

What you can do, is not be interested in the scores of labels that were not predicted, and then explicitly specify the labels you are interested in.

Upvotes: 1

Related Questions