Muhammed Eltabakh
Muhammed Eltabakh

Reputation: 497

PCA sklearn ValueError: could not convert string to float: '[1, 3]'

I have a pandas data frame that looks like this

1   0   0   0   0   2   0   0   0   0   0   ... 0   0   1   2   0   0   0   0   0   0
2   0   0   0   3   0   0   0   4   0   0   ... 0   5   0   0   3   0   0   [1, 3]  0   0
3   0   0   0   0   0   0   0   2   0   0   ... 6   6   0   [2, 4]  0   2   0   0   0   0
4   1   0   6   1   0   0   0   0   0   0   ... 0   0   0   0   4   0   0   5   0   0
5   0   0   0   0   6   0   0   [2, 7]  0   0   ... 0   0   0   0   0   0   0   0   0   0

I'm trying to use PCA to reduce the dimensionality of my data, but there are some points in the data that are more than 1 dimension like this [2, 7] so when I run PCA I get this error

data = pca.fit_transform(z)

ValueError: could not convert string to float: '[1, 3]'

How do I handle this

Upvotes: 1

Views: 3864

Answers (1)

andrew_reece
andrew_reece

Reputation: 21284

Vanilla PCA doesn't work when observations have varying lengths.
If [1,3] means there were two data points for that individual cell, use a summarization function (e.g. mean or median) to establish a single value for that cell first, then run PCA.

(Also, it seems your dtype for those fields is str - remember to convert to a numeric type.)

Upvotes: 2

Related Questions