Reputation: 605
I have created features X and labels y for the dataset I am working on.
At this point, I want to train a random forest classifier on it but I am facing a ValueError while fitting the classifier on the training data: setting an array element with a sequence.
Below the X and y features and the error details:
X:
(array([-8.1530527e-10, 8.9952795e-10, -9.1185753e-10, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([-0.00050612, -0.00057967, -0.00035985, ..., 0. ,
0. , 0. ], dtype=float32),
array([ 6.8139506e-08, -2.3837963e-05, -2.4622474e-05, ...,
3.1678758e-06, -2.4535689e-06, 0.0000000e+00], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
6.9306935e-07, -6.6020442e-07, 0.0000000e+00], dtype=float32),
array([-7.30260945e-05, -1.18022966e-04, -1.08280736e-04, ...,
8.83421380e-05, 4.97258679e-06, 0.00000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([ 2.3406714e-05, 3.1186773e-05, 4.9467826e-06, ...,
1.2180173e-07, -9.2944845e-08, 0.0000000e+00], dtype=float32),
array([ 1.1845550e-06, -1.6399191e-06, 2.5565218e-06, ...,
-8.7445065e-09, 5.9859917e-09, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([-1.3284328e-05, -7.4090644e-07, 7.2679302e-07, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
5.0694009e-08, -3.4546797e-08, 0.0000000e+00], dtype=float32),
array([ 1.5591205e-07, -1.5845627e-07, 1.5362870e-07, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., 1.1608539e-05,
8.2463991e-09, 0.0000000e+00], dtype=float32),
array([-3.6192148e-07, -1.4590451e-05, -5.3999561e-06, ...,
-1.9935460e-05, -3.4417746e-05, 0.0000000e+00], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-2.5319534e-07, 2.6521766e-07, 0.0000000e+00], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-2.5055220e-08, 1.2936166e-08, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([ 1.3387315e-05, 6.0913658e-07, -5.6471418e-07, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 1.7200684e-02, 3.2272514e-02, 3.2961801e-02, ...,
-1.6286784e-06, -8.5592075e-07, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-3.3923173e-11, 2.8026699e-11, 0.0000000e+00], dtype=float32),
array([-0.00103188, -0.00075814, -0.00051426, ..., 0. ,
0. , 0. ], dtype=float32),
array([ 7.6278877e-07, 2.1624428e-05, 1.1150542e-05, ...,
1.8263392e-09, -1.5558380e-09, 0.0000000e+00], dtype=float32),
array([-1.2111740e-07, 6.3130176e-07, -1.8378003e-06, ...,
1.1309878e-05, 5.4562256e-06, 0.0000000e+00], dtype=float32),
array([0.00026949, 0.00028119, 0.00020081, ..., 0.00032586, 0.00046612,
0. ], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-7.8796054e-09, 1.7431153e-08, 0.0000000e+00], dtype=float32),
array([1.42000988e-06, 1.30781755e-05, 2.77493709e-05, ...,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00], dtype=float32),
array([ 2.9161662e-10, -6.3629275e-11, -3.0565092e-10, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 2.2051008e-05, 1.6838792e-05, 3.5639907e-05, ...,
4.5767497e-06, -1.2002213e-05, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-2.0104826e-10, 1.6824393e-10, 0.0000000e+00], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-4.8303300e-06, -1.2008861e-05, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-2.7673337e-07, 2.8604177e-07, 0.0000000e+00], dtype=float32),
array([-0.00066044, -0.0009837 , -0.00090796, ..., -0.00171516,
-0.0017666 , 0. ], dtype=float32),
array([ 3.2218946e-11, -5.5296181e-11, 8.9530647e-11, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([-1.3284328e-05, -7.4090644e-07, 7.2679302e-07, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 4.9886359e-05, 1.4642075e-04, 4.4365996e-04, ...,
6.3584002e-07, -6.2395281e-07, 0.0000000e+00], dtype=float32),
array([-3.2826196e-04, 4.5522624e-03, -8.2306744e-04, ...,
-2.2519816e-07, -6.2417300e-08, 0.0000000e+00], dtype=float32),
array([ 3.1686827e-04, 4.6282235e-04, 1.0160641e-04, ...,
-1.4605960e-05, 6.6572487e-05, 0.0000000e+00], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-7.1763244e-09, -2.8297892e-08, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([-2.5870585e-07, 4.6514080e-07, -9.5607948e-07, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 5.788035e-07, -6.493598e-07, 7.111379e-07, ..., 0.000000e+00,
0.000000e+00, 0.000000e+00], dtype=float32),
array([ 2.5118000e-04, 1.4220485e-03, 3.9536849e-04, ...,
4.5242754e-04, -3.1405249e-05, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([ 1.1985266e-07, 2.1360799e-07, -1.1951373e-06, ...,
-1.3043609e-04, 1.2107374e-06, 0.0000000e+00], dtype=float32),
array([0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., 2.5944988e-08,
1.2123945e-07, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([-2.4280996e-06, -1.2362683e-05, -8.5034850e-07, ...,
-1.0113516e-11, 5.1403621e-12, 0.0000000e+00], dtype=float32),
array([9.6098862e-05, 1.6449913e-04, 1.1942573e-04, ..., 0.0000000e+00,
0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 1.3284328e-05, 7.4090644e-07, -7.2679302e-07, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 2.4700081e-05, 2.9454704e-05, 8.0751715e-06, ...,
1.2746801e-07, -1.6574201e-06, 0.0000000e+00], dtype=float32),
array([8.4619669e-06, 9.7476968e-06, 2.0182479e-05, ..., 2.1081217e-11,
4.0220186e-10, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32))
y below
('08',
'08',
'06',
'05',
'05',
'04',
'06',
'07',
'01',
'04',
'03',
'07',
'03',
'01',
'03',
'03',
'02',
'02',
'02',
'02',
'05',
'06',
'04',
'08',
'07',
'06',
'04',
'05',
'07',
'02',
'08',
'01',
'08',
'03',
'08',
'02',
'03',
'06',
'04',
'07',
'04',
'07',
'05',
'06',
'08',
'08',
'04',
'05',
'05',
'04',
'06',
'07',
'05',
'07',
'01',
'06',
'02',
'02',
'03',
'03')
Code for the classifier plus the train/test split:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
from sklearn.tree import DecisionTreeClassifier
dtree = DecisionTreeClassifier()
dtree.fit(X_train, y_train)
Error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-70-b6417fbfb8de> in <module>()
1 from sklearn.tree import DecisionTreeClassifier
2 dtree = DecisionTreeClassifier()
----> 3 dtree.fit(X_train, y_train)
/usr/local/lib/python3.6/dist-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
788 sample_weight=sample_weight,
789 check_input=check_input,
--> 790 X_idx_sorted=X_idx_sorted)
791 return self
792
/usr/local/lib/python3.6/dist-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
114 random_state = check_random_state(self.random_state)
115 if check_input:
--> 116 X = check_array(X, dtype=DTYPE, accept_sparse="csc")
117 y = check_array(y, ensure_2d=False, dtype=None)
118 if issparse(X):
/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
431 force_all_finite)
432 else:
--> 433 array = np.array(array, dtype=dtype, order=order, copy=copy)
434
435 if ensure_2d:
ValueError: setting an array element with a sequence.
EDIT1: I converted both X and y into numpy arrays but the error I am receiving is the same, details below
import numpy as np
X = np.asarray(X)
y = np.asarray(y)
X.shape, y.shape
Output:
((60,), (60,))
Upvotes: 2
Views: 2540
Reputation: 1809
It appears that the problem is your X. Probably one of the arrays constituting it has a different length, that causes the tuple that you have build, and that is transformed into a Numpy array by Scikit-learn when processed by the DecisionTreeClassifier, to transform into a vector of strings, which are not what the decision tree function expects to process.
Just check this code snippet:
X1 = (array([-8.1530527e-10, 8.9952795e-10, -9.1185753e-10,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype='float32'),
array([0., 0., 0., 0., 0., 0.], dtype='float32'),
array([0., 0., 0., 0., 0., 0.], dtype='float32'))
X2 = (array([-8.1530527e-10, 8.9952795e-10, -9.1185753e-10,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype='float32'),
array([0., 0., 0., 0., 0., 0., 1], dtype='float32'),
array([0., 0., 0., 0., 0., 0.], dtype='float32'))
print("X1:", np.array(X1).dtype, "\nX2:", np.array(X2).dtype)
By just changing the second element of X2 with the addition of a further number causes the X2 array to turn into a string array (object type).
Upvotes: 1