Reputation: 17869
It's a very basic concept: I have more than one dependency for training. My data is all text and I have three separate fields. Every example I have been able to find has text data set up like this:
data = ['text1','text2',...]
where mine looks like:
data = [['text1','text2','text3'],[...],...]
but when I try and fit to the data I get the following traceback:
ValueError Traceback (most recent call last)
<ipython-input-25-e3356a0f62f8> in <module>()
----> 1 classifier.fit(X,y)
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sklearn/svm/base.pyc in fit(self, X, y, sample_weight)
140 "by not using the ``sparse`` parameter")
141
--> 142 X = atleast2d_or_csr(X, dtype=np.float64, order='C')
143
144 if self.impl in ['c_svc', 'nu_svc']:
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sklearn/utils/validation.pyc in atleast2d_or_csr(X, dtype, order, copy)
114 """
115 return _atleast2d_or_sparse(X, dtype, order, copy, sparse.csr_matrix,
--> 116 "tocsr")
117
118
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sklearn/utils/validation.pyc in _atleast2d_or_sparse(X, dtype, order, copy, sparse_class, convmethod)
94 _assert_all_finite(X.data)
95 else:
---> 96 X = array2d(X, dtype=dtype, order=order, copy=copy)
97 _assert_all_finite(X)
98 return X
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sklearn/utils/validation.pyc in array2d(X, dtype, order, copy)
78 raise TypeError('A sparse matrix was passed, but dense data '
79 'is required. Use X.toarray() to convert to dense.')
---> 80 X_2d = np.asarray(np.atleast_2d(X), dtype=dtype, order=order)
81 _assert_all_finite(X_2d)
82 if X is X_2d and copy:
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/numeric.pyc in asarray(a, dtype, order)
318
319 """
--> 320 return array(a, dtype, copy=False, order=order)
321
322 def asanyarray(a, dtype=None, order=None):
ValueError: setting an array element with a sequence.
is there a specific way I have to approach this? Thank you!
NOTES:
All of the text data I am using is vectorized by a HashingVectorizer
clf.fit(X,y)
where X
is a list of lists that contain 3 vectorized texts, and y
is a list of the respective categories that the element of X
belongs to
Upvotes: 4
Views: 10766
Reputation: 2482
X has to be a 2 dimensional array (or list of lists, if you want). And each list in this list of lists has to be a list of numeric values. And all this lists must have the same length. Like this: [[1,2,3,5],[3,4,5,6],[6,7,8,9],...]. If for each object you have several text entries which you are vectorizing, you need to combine the resultant vectorized texts into a single list. For example, concatenating them, if it makes sense in your context. So eventually each object has to be represented by a single list where all entries are numeric. And all objects must be represented by lists of equal length, where corresponding elements in all the lists represent the same feature (e.g. frequency of the same token in your texts). Let me know whether what I'm saying makes sense.
Upvotes: 6