Reputation: 11366
I am trying to do PCA analysis using princomp
function in R.
The following is the example code:
mydf <- data.frame (
A = c("NA", rnorm(10, 4, 5)),
B = c("NA", rnorm(9, 4, 5), "NA"),
C = c("NA", "NA", rnorm(8, 4, 5), "NA")
)
out <- princomp(mydf, cor = TRUE, na.action=na.exclude)
Error in cov.wt(z) : 'x' must contain finite values only
I tried to remove the NA
from the dataset, but it does not work.
ndnew <- mydf[complete.cases(mydf),]
A B C
1 NA NA NA
2 1.67558617743171 1.28714736288378 NA
3 -1.03388645096478 9.8370942023751 10.9522215389562
4 7.10494481721949 14.7686678743866 4.06560213642725
5 13.966212462717 3.92061729913733 7.12875100279949
6 -1.91566982754146 0.842774330179978 5.26042516598668
7 0.0974919570675357 5.5264365812476 6.30783046905425
8 12.7384749395121 4.72439301946042 2.9318845479507
9 13.1859349108349 -0.546676530952666 9.98938028956806
10 4.97278207223239 6.95942086859593 5.15901566720956
11 -4.10115142119221 NA NA
Even if I can remove the NA
's it might not be of help as every rows or column has at least one missing values. Is there any R method that can impute the data doing PCA analysis?
UPDATE: based on the answers:
> mydf <- data.frame (A = c(NA, rnorm(10, 4, 5)), B = c(NA, rnorm(9, 4, 5), NA),
+ C = c(NA, NA, rnorm(8, 4, 5), NA))
> out <- princomp(mydf, cor = TRUE, na.action=na.exclude)
Error in cov.wt(z) : 'x' must contain finite values only
ndnew <- mydf[complete.cases(mydf),]
out <- princomp(ndnew, cor = TRUE, na.action=na.exclude)
This works but the defult na.action
does not work.
Is there is any method that can impute the data, as in real data I have almost every column with missing value in them? The result of such NA
omission will give me ~ 0 rows or columns.
Upvotes: 3
Views: 18503
Reputation: 2667
The nipals
library will perform PCA with missing values and provide fitted
values.
set.seed(1)
mydf <- data.frame (
A = c(NA, rnorm(10, 4, 5)),
B = c(NA, rnorm(9, 4, 5), NA),
C = c(NA, NA, rnorm(8, 4, 5), NA)
)
# Remove rows with all missing values
mydf <- mydf[ !apply(mydf, 1, function(x) all(is.na(x))), ]
mydf
library(nipals)
res <- nipals(mydf, fitted=TRUE)
# Look at fitted values
res$fitted
# Compare fitted and observed values
res$fitted-mydf
A B C
2 0.0062853910 0.0253433878 NA
3 -0.0005800986 0.0015428998 0.001829560
4 0.0046210396 -0.0019671275 -0.007074557
5 0.0062666341 0.0083711959 -0.001574603
6 -0.0034899784 0.0007386345 0.004800290
7 0.0018738600 -0.0097446464 -0.009368384
8 0.0003539155 0.0029634392 0.001720441
9 -0.0035414103 0.0021827218 0.005912196
10 -0.0028836774 0.0012138259 0.004404780
11 0.0001702055 NA NA
Upvotes: 0
Reputation: 109844
It's because you used character version of NA which really isn't NA.
This demonstrates what I mean:
is.na("NA")
is.na(NA)
I'd fix it at the creation level but here's a way to retro fix it (because you used the character "NA" it makes the whole column of the class character
meaning you'll have to fix that with as.numeric
as well):
FUN <- function(x) as.numeric(ifelse(x=="NA", NA, x))
mydf2 <- data.frame(apply(mydf, 2, FUN))
ndnew <- mydf[complete.cases(mydf2),]
ndnew
which yields:
A B C
3 11.3349957691175 6.97143301427903 -2.13578124048775
4 5.69035783905702 -2.44999550936244 -4.40642099309301
5 -0.865878644072023 6.03782080227184 9.83402859382248
6 6.58329959845638 5.67811450593805 12.4477770011262
7 0.759928613563254 16.6445809805028 9.45835418422973
8 11.3798459951171 1.36989010500538 0.784492783538675
9 0.671542080233918 5.9024564388189 16.2389092991422
10 3.64295741533713 9.78754135462621 -2.4293697924212
EDIT:==========================================================
"this works but the defult na.action do not work"
Don't know much about princomp but this works (not sure why the function's na.action doesn't):
out <- princomp(na.omit(mydf), cor = TRUE)
"Is there is any method that can impute the data, as in real data I have almost every column with missing value in them ? result of such na omit will give me ~ 0 rows or columns"
This really is a separate question from your first and you should start a new thread after researching the topic on your own a little bit.
Upvotes: 9
Reputation: 162321
For na.action
to have an effect, you need to explicitly supply a formula
argument:
princomp(formula = ~., data = mydf, cor = TRUE, na.action=na.exclude)
# Call:
# princomp(formula = ~., data = mydf, na.action = na.exclude, cor = TRUE)
#
# Standard deviations:
# Comp.1 Comp.2 Comp.3
# 1.3748310 0.8887105 0.5657149
The formula is needed because it triggers dispatch of princomp.formula
, the only princomp
method that does anything useful with na.action
.
methods('princomp')
[1] princomp.default* princomp.formula*
names(formals(stats:::princomp.formula))
[1] "formula" "data" "subset" "na.action" "..."
names(formals(stats:::princomp.default))
[1] "x" "cor" "scores" "covmat" "subset" "..."
Upvotes: 8