dreww2
dreww2

Reputation: 1611

PCA and Constant-Zero Column Error

I have a question about PCA using the caret package and an error message I'm getting, "cannot rescale a constant/zero column to unit variance".

Consider two sets of similar code. The first works just fine:

a = c(0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, -1, -1, NA)
b = c(1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, -1, -1, NA)
c = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,  1,  0,  0)

df = data.frame(a, b, c)

trans = preProcess(df, method = c("center", "scale", "pca"))

The variance of each column can be seen as:

apply(df, 2, var, na.rm=TRUE)

Note that the variance of column "c" is 0.11

Let's say I change the second to last integer in column "c" to 1 instead of 0, and then run the same code:

a = c(0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, -1, -1, NA)
b = c(1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, -1, -1, NA)
c = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,  1,  1,  0)

df = data.frame(a, b, c)

trans = preProcess(df, method = c("center", "scale", "pca"))

I get an error message:

Error in prcomp.default(x, scale = TRUE, retx = FALSE) : 
  cannot rescale a constant/zero column to unit variance

If you look at the variance for column c, it's 0.059:

apply(df, 2, var, na.rm=TRUE)

Can anyone please help me understand the difference between these two sets of code and why the second gives an error when the first does not?

Thank you

Upvotes: 2

Views: 5238

Answers (1)

davechilders
davechilders

Reputation: 9123

PCA only uses complete observations. In your second definition of df above, a PCA analysis will drop the last row due to missingness. And column c is constant within the remaining rows.

Note: my answer is around PCA generally and not specific to the caret package.

Upvotes: 3

Related Questions