Reputation: 15
I'm trying to run PCA using sklearn on a dataset with 162 columns and 69,000 rows. I keep getting the float error message below and I've checked to make sure I only have numerical data. What could I be doing wrong? Any help would be really appreciated.
>>> data = np.loadtxt("PCAdata.txt")
>>> trans = data.transpose()
>>> trans
array([[0., 0., 1., ..., 0., 0., 1.],
[0., 0., 1., ..., 1., 0., 2.],
[0., 0., 1., ..., 0., 0., 1.],
...,
[1., 0., 1., ..., 0., 0., 1.],
[0., 0., 1., ..., 0., 0., 2.],
[0., 0., 1., ..., 0., 0., 2.]])
>>> sscaler = preprocessing.StandardScaler().fit(trans)
>>> sscaler
StandardScaler(copy=True, with_mean=True, with_std=True)
>>> pca = PCA(n_components=2)
>>> pca.fit(sscaler)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\sklearn\decomposition\pca.py", line 329, i
n fit
self._fit(X)
File "C:\Python27\lib\site-packages\sklearn\decomposition\pca.py", line 370, i
n _fit
copy=self.copy)
File "C:\Python27\lib\site-packages\sklearn\utils\validation.py", line 433, in
check_array
array = np.array(array, dtype=dtype, order=order, copy=copy)
TypeError: float() argument must be a string or a number
Upvotes: 0
Views: 378
Reputation: 1868
fit
method does not return a matrix. Sklearn gives error because parameter you feed, sscaler
, is not a matrix of numbers. If you want to get scaled data matrix you may use fit_transform
method or use fit
and transform
methods separately.
Example :
data = np.random.randint(0, 3, (100, 10))
scaler = StandardScaler()
data = scaler.fit_transform(data)
pca = PCA()
data = pca.fit_transform(data)
Upvotes: 1