Reputation: 138
When i run the code below yesterday, it was working. But when i run this code today, i got this error. I think this problem originates to revise my data but when i try with old data, it still give same error. (I'm not sure about, is it related with data's shape but i want to show it.) Can someone help me?
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 0)
print("Shape of x_train :", x_train.shape)
print("Shape of x_test :", x_test.shape)
print("Shape of y_train :", y_train.shape)
print("Shape of y_test :", y_test.shape)
Shape of x_train : (257763, 96)
Shape of x_test : (64441, 96)
Shape of y_train : (257763,)
Shape of y_test : (64441,)
from imblearn.ensemble import BalancedRandomForestClassifier
model = BalancedRandomForestClassifier(n_estimators = 200, random_state = 0, max_depth=6)
model.fit(x_train, y_train)
Full Error is below;
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-9-7698c432c37d> in <module>
7
8 model = BalancedRandomForestClassifier(n_estimators = 200, random_state =
0, max_depth=6)
----> 9 model.fit(x_train, y_train)
10 y_pred_rf = model.predict(x_test)
11
/opt/anaconda/envs/env_python/lib/python3.6/site-
packages/imblearn/ensemble/_forest.py in fit(self, X, y, sample_weight)
433 s, t, self, X, y, sample_weight, i,
len(trees),
434 verbose=self.verbose,
class_weight=self.class_weight)
--> 435 for i, (s, t) in enumerate(zip(samplers,
trees)))
436 samplers, trees = zip(*samplers_trees)
437
/opt/anaconda/envs/env_python/lib/python3.6/site-
packages/joblib/parallel.py
in __call__(self, iterable)
919 # remaining jobs.
920 self._iterating = False
--> 921 if self.dispatch_one_batch(iterator):
922 self._iterating = self._original_iterator is not None
923
/opt/anaconda/envs/env_python/lib/python3.6/site-
packages/joblib/parallel.py in dispatch_one_batch(self, iterator)
757 return False
758 else:
--> 759 self._dispatch(tasks)
760 return True
761
/opt/anaconda/envs/env_python/lib/python3.6/site-
packages/joblib/parallel.py in _dispatch(self, batch)
714 with self._lock:
715 job_idx = len(self._jobs)
--> 716 job = self._backend.apply_async(batch, callback=cb)
717 # A job can complete so quickly than its callback is
718 # called before we get here, causing self._jobs to
/opt/anaconda/envs/env_python/lib/python3.6/site-
packages/joblib/_parallel_backends.py in apply_async(self, func,
callback)
180 def apply_async(self, func, callback=None):
181 """Schedule a func to be run"""
--> 182 result = ImmediateResult(func)
183 if callback:
184 callback(result)
/opt/anaconda/envs/env_python/lib/python3.6/site-
packages/joblib/_parallel_backends.py in __init__(self, batch)
547 # Don't delay the application, to avoid keeping the input
548 # arguments in memory
--> 549 self.results = batch()
550
551 def get(self):
/opt/anaconda/envs/env_python/lib/python3.6/site-
packages/joblib/parallel.py in __call__(self)
223 with parallel_backend(self._backend, n_jobs=self._n_jobs):
224 return [func(*args, **kwargs)
--> 225 for func, args, kwargs in self.items]
226
227 def __len__(self):
/opt/anaconda/envs/env_python/lib/python3.6/site-
packages/joblib/parallel.py in <listcomp>(.0)
223 with parallel_backend(self._backend, n_jobs=self._n_jobs):
224 return [func(*args, **kwargs)
--> 225 for func, args, kwargs in self.items]
226
227 def __len__(self):
/opt/anaconda/envs/env_python/lib/python3.6/site-
packages/imblearn/ensemble/_forest.py in
_local_parallel_build_trees(sampler, tree, forest, X, y, sample_weight,
tree_idx, n_trees, verbose, class_weight)
43 tree = _parallel_build_trees(tree, forest, X_resampled,
y_resampled,
44 sample_weight, tree_idx, n_trees,
---> 45 verbose=verbose,
class_weight=class_weight)
46 return sampler, tree
47
/opt/anaconda/envs/env_python/lib/python3.6/site-
packages/sklearn/ensemble/_forest.py in _parallel_build_trees(tree,
forest, X, y, sample_weight, tree_idx, n_trees, verbose, class_weight,
n_samples_bootstrap)
153 indices = _generate_sample_indices(tree.random_state,
n_samples,
154 n_samples_bootstrap)
--> 155 sample_counts = np.bincount(indices, minlength=n_samples)
156 curr_sample_weight *= sample_counts
157
<__array_function__ internals> in bincount(*args, **kwargs)
ValueError: object of too small depth for desired array
Upvotes: 2
Views: 5050
Reputation: 1
One small hack is to install the latest version of python in Jupyter notebook (For me installing 3.7.4 worked). For the older version of python, the error is still persistent.
I have the same issue. I installed Jupyter notebook on my computer and the python version on my notebook is 3.7.4. BalancedRandomForestClassifier works totally fine. However, When I'm trying to run it on an older version say python 3.6. I'm experiencing the same glitch mentioned above.
My created feature (BoW) is an array with two dimensions as well.
array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]])
Jupyter notebook on my machine
Jupyter notebook on my Google Colab
Upvotes: 0
Reputation: 231510
According to the traceback the error is raised by bincount
. This reproduces it:
In [13]: np.bincount(0)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-13-65825aeaf27a> in <module>
----> 1 np.bincount(0)
<__array_function__ internals> in bincount(*args, **kwargs)
ValueError: object of too small depth for desired array
In [14]: np.bincount(np.arange(5))
Out[14]: array([1, 1, 1, 1, 1])
bincount
works with a 1d array; it raises this error if given a scalar.
Now work you way back through to traceback
to figure out what variable in your code is a scalar when it should be an array.
Upvotes: 2