Reputation: 497
I have a pandas data frame that looks like this
1 0 0 0 0 2 0 0 0 0 0 ... 0 0 1 2 0 0 0 0 0 0
2 0 0 0 3 0 0 0 4 0 0 ... 0 5 0 0 3 0 0 [1, 3] 0 0
3 0 0 0 0 0 0 0 2 0 0 ... 6 6 0 [2, 4] 0 2 0 0 0 0
4 1 0 6 1 0 0 0 0 0 0 ... 0 0 0 0 4 0 0 5 0 0
5 0 0 0 0 6 0 0 [2, 7] 0 0 ... 0 0 0 0 0 0 0 0 0 0
I'm trying to use PCA to reduce the dimensionality of my data, but there are some points in the data that are more than 1 dimension like this [2, 7]
so when I run PCA I get this error
data = pca.fit_transform(z)
ValueError: could not convert string to float: '[1, 3]'
How do I handle this
Upvotes: 1
Views: 3864
Reputation: 21284
Vanilla PCA doesn't work when observations have varying lengths.
If [1,3]
means there were two data points for that individual cell, use a summarization function (e.g. mean or median) to establish a single value for that cell first, then run PCA.
(Also, it seems your dtype
for those fields is str
- remember to convert to a numeric type.)
Upvotes: 2