Reputation: 648
I am trying to calculate sklearn log loss but continuously getting value error. how to resolve the error. The code is simple - fit the label encoder to array and then use sklearn logloss that takes three arguments - the labels, the ground truth and the probability values of each class.
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit([2.5, 3.0, 3.5, 3.8, 4.0, 4.5, 5.0, 5.5, 6.0])
from sklearn.metrics import log_loss
le.classes_
log_loss([6.0], [[0., 0., 0., 0., 0.28571429, 0.14285714, 0., 0.57142857, 0. ]], labels=list(le.classes_))
Error
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
C:\Users\PRANAV~1\AppData\Local\Temp/ipykernel_25368/2311544075.py in <module>
----> 1 log_loss([6.0], [[0., 0., 0., 0., 0.28571429, 0.14285714,
2 0., 0.57142857, 0. ]], labels=list(le.classes_))
~\AppData\Roaming\Python\Python39\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
61 extra_args = len(args) - len(all_args)
62 if extra_args <= 0:
---> 63 return f(*args, **kwargs)
64
65 # extra_args > 0
~\AppData\Roaming\Python\Python39\site-packages\sklearn\metrics\_classification.py in log_loss(y_true, y_pred, eps, normalize, sample_weight, labels)
2233
2234 if labels is not None:
-> 2235 lb.fit(labels)
2236 else:
2237 lb.fit(y_true)
~\AppData\Roaming\Python\Python39\site-packages\sklearn\preprocessing\_label.py in fit(self, y)
295
296 self.sparse_input_ = sp.issparse(y)
--> 297 self.classes_ = unique_labels(y)
298 return self
299
~\AppData\Roaming\Python\Python39\site-packages\sklearn\utils\multiclass.py in unique_labels(*ys)
96 _unique_labels = _FN_UNIQUE_LABELS.get(label_type, None)
97 if not _unique_labels:
---> 98 raise ValueError("Unknown label type: %s" % repr(ys))
99
100 ys_labels = set(chain.from_iterable(_unique_labels(y) for y in ys))
ValueError: Unknown label type: ([2.5, 3.0, 3.5, 3.8, 4.0, 4.5, 5.0, 5.5, 6.0],)
Upvotes: 0
Views: 267
Reputation: 33147
What you are doing is not valid.
Log_loss excepts as input arguments y_true, y_pred
), which are the ground truth (correct) labels for n_samples samples and the predicted probabilities, as returned by a classifier’s predict_proba method, respectively.
To solve this, convert the numerical (invalid) labels into strings:
log_loss(['6.0'], [[0., 0., 0., 0., 0.28571429, 0.14285714, 0., 0.57142857, 0.]],
labels=list(le.classes_.astype(str)))
# 34.53877639491069
The problem: you have floats as labels and this breaks the function. In sklearn
, numerical labels need to be integers.
Here is a full numerical example:
log_loss([6], [[0., 0., 0., 0., 0.28571429, 0.14285714, 0., 0.57142857, 0.]],
...: labels=[0,1,2,3,4,5,6,7,8])
Here is another problematic case:
log_loss([6], [[0., 0., 0., 0., 0.28571429, 0.14285714, 0., 0.57142857, 0.]],
...: labels=[0,1.1,2,3,4,5,6,7,8])
# ValueError: Unknown label type: ([0, 1.1, 2, 3, 4, 5, 6, 7, 8],)
Upvotes: 1