Anjith
Anjith

Reputation: 2308

Can I standardize my PCA applied count vector?

I have applied CountVectorizer() on my X_train and it returned a sparse matrix.

Usually if we want to Standardize sparse matrix we pass in with_mean=False param.

scaler = StandardScaler(with_mean=False)
X_train = scaler.fit_transform()

But In my case after applying CountVectorizer on my X_train I have also performed PCA(TruncatedSVD) to reduce dimensions. Now my data is not a sparse matrix.

So now can I apply StandardScaler() directly without passing with_mean=False (i.e with_mean=True)?

Upvotes: 2

Views: 286

Answers (1)

panktijk
panktijk

Reputation: 1614

If you take a look at what the with_mean parameter does, you'll find that it simply centers your data before scaling. The reason why you don't center a sparse matrix is because when you try to center a sparse matrix it will get transformed into a dense matrix and will occupy much more memory, thus destroying its sparsity in the first place.

After you perform PCA your data has reduced dimensions and can now be centered before scaling. So yes, you can apply StandardScaler() directly.

Upvotes: 1

Related Questions