Reputation: 2308
I have applied CountVectorizer()
on my X_train
and it returned a sparse matrix.
Usually if we want to Standardize sparse matrix we pass in with_mean=False
param.
scaler = StandardScaler(with_mean=False)
X_train = scaler.fit_transform()
But In my case after applying CountVectorizer on my X_train
I have also performed PCA(TruncatedSVD) to reduce dimensions. Now my data is not a sparse matrix.
So now can I apply StandardScaler()
directly without passing with_mean=False
(i.e with_mean=True)
?
Upvotes: 2
Views: 286
Reputation: 1614
If you take a look at what the with_mean
parameter does, you'll find that it simply centers your data before scaling. The reason why you don't center a sparse matrix is because when you try to center a sparse matrix it will get transformed into a dense matrix and will occupy much more memory, thus destroying its sparsity in the first place.
After you perform PCA your data has reduced dimensions and can now be centered before scaling. So yes, you can apply StandardScaler()
directly.
Upvotes: 1