Reputation: 449
I am working on Binary Logistic regression (with completely categorical data) I have OneHotEncoded it and attempt to run Binary logistic regression. I am getting this error below and I have no idea how to deal with errors. I understand it gives you some information in the last line but I don't where there could possibly be str values here?
[IN]: train_set, test_set = train_test_split(allyrs, test_size = 0.2, random_state = 42)
[In] X = train_set.iloc[:, 31 : 175]
# Set up binary y value
[IN]: y=train_set.iloc[:, 29]
# Set up multi y value
[IN]: ym=train_set.iloc[:, 30]
# first attempt to feed through is says :
[IN]:
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import LogisticRegressionCV
[IN]:BiLog_cv = LogisticRegressionCV(cv=3, random_state=0).fit(X, y)
AttributeError Traceback (most recent call last)
<ipython-input-19-2468362218dc> in <module>
1 # Binary Logistic Regresiion with Cross Validation - training set
2 # Fit Model
----> 3 BiLog_cv = LogisticRegressionCV(cv=3, random_state=0).fit(X, y)
~\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py in fit(self, X, y, sample_weight)
1883 prefer = 'processes'
1884
-> 1885 fold_coefs_ = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
1886 **_joblib_parallel_args(prefer=prefer))(
1887 path_func(X, y, train, test, pos_class=label, Cs=self.Cs,
~\anaconda3\lib\site-packages\joblib\parallel.py in __call__(self, iterable)
1039 # remaining jobs.
1040 self._iterating = False
-> 1041 if self.dispatch_one_batch(iterator):
1042 self._iterating = self._original_iterator is not None
1043
~\anaconda3\lib\site-packages\joblib\parallel.py in dispatch_one_batch(self, iterator)
857 return False
858 else:
--> 859 self._dispatch(tasks)
860 return True
861
~\anaconda3\lib\site-packages\joblib\parallel.py in _dispatch(self, batch)
775 with self._lock:
776 job_idx = len(self._jobs)
--> 777 job = self._backend.apply_async(batch, callback=cb)
778 # A job can complete so quickly than its callback is
779 # called before we get here, causing self._jobs to
~\anaconda3\lib\site-packages\joblib\_parallel_backends.py in apply_async(self, func, callback)
206 def apply_async(self, func, callback=None):
207 """Schedule a func to be run"""
--> 208 result = ImmediateResult(func)
209 if callback:
210 callback(result)
~\anaconda3\lib\site-packages\joblib\_parallel_backends.py in __init__(self, batch)
570 # Don't delay the application, to avoid keeping the input
571 # arguments in memory
--> 572 self.results = batch()
573
574 def get(self):
~\anaconda3\lib\site-packages\joblib\parallel.py in __call__(self)
260 # change the default number of processes to -1
261 with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 262 return [func(*args, **kwargs)
263 for func, args, kwargs in self.items]
264
~\anaconda3\lib\site-packages\joblib\parallel.py in <listcomp>(.0)
260 # change the default number of processes to -1
261 with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 262 return [func(*args, **kwargs)
263 for func, args, kwargs in self.items]
264
~\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py in _log_reg_scoring_path(X, y, train, test, pos_class, Cs, scoring, fit_intercept, max_iter, tol, class_weight, verbose, solver, penalty, dual, intercept_scaling, multi_class, random_state, max_squared_sum, sample_weight, l1_ratio)
963 sample_weight = sample_weight[train]
964
--> 965 coefs, Cs, n_iter = _logistic_regression_path(
966 X_train, y_train, Cs=Cs, l1_ratio=l1_ratio,
967 fit_intercept=fit_intercept, solver=solver, max_iter=max_iter,
~\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py in _logistic_regression_path(X, y, pos_class, Cs, fit_intercept, max_iter, tol, verbose, solver, coef, class_weight, dual, penalty, intercept_scaling, multi_class, random_state, check_input, max_squared_sum, sample_weight, l1_ratio)
760 options={"iprint": iprint, "gtol": tol, "maxiter": max_iter}
761 )
--> 762 n_iter_i = _check_optimize_result(
763 solver, opt_res, max_iter,
764 extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)
~\anaconda3\lib\site-packages\sklearn\utils\optimize.py in _check_optimize_result(solver, result, max_iter, extra_warning_msg)
241 " https://scikit-learn.org/stable/modules/"
242 "preprocessing.html"
--> 243 ).format(solver, result.status, result.message.decode("latin1"))
244 if extra_warning_msg is not None:
245 warning_msg += "\n" + extra_warning_msg
AttributeError: 'str' object has no attribute 'decode'
Can someone give me some insight please, I just ran this dataset through Categorical NB and got it to work.
Thank you
Upvotes: 1
Views: 2912
Reputation: 2042
In the most recent version of scikit-learn (now 0.24.1) the problem has been fixed enclosing a part of code in a try-catch block. This was explained in more detail by Gigioz in this stackoverflow question.
To upgrade scikit-learn use the code below:
pip install -U scikit-learn
And restart the kernel.
Upvotes: 5
Reputation: 11330
I know nothing about this package, but your result appears to be on line 243. The function decode
is called on a byte array to convert it into a string. But it appears that result.message
is already a string. Try just deleting .decode("latin1")
.
Upvotes: 0