Reputation: 763
I've been trying to do a PCA analysis with R and 'prcomp'. My data (plan0) is a dataframe with a lot of NA's, so i do
plan0_sna <- na.omit(plan0)
The resulting data is here When I try to do
m2 <- princomp(plan0_sna, cor=TRUE)
Error in cov.wt(z) : 'x' must contain finite values only
So, I need to convert to matrix
matrix0 <- data.matrix (plan0_sna)
but in the resulting data there is no name of the state
head(matrix0)
LOCAL PM2.5 BC Al Si
2 1 21 5.9 0.02278234 0.2993741
3 1 22 7.6 0.06149135 0.1828806
12 1 28 18.4 0.01614913 0.1905879
17 1 31 18.5 0.04290772 0.1603130
18 1 26 8.5 0.03344481 0.4836519
19 1 35 14.1 0.11562827 0.3842194
I'd like to do the analysis without loosing the name of the state, as with the USArrest data:
> USArrests
Murder Assault UrbanPop Rape
Alabama 13.2 236 58 21.2
Alaska 10.0 263 48 44.5
Arizona 8.1 294 80 31.0
Arkansas 8.8 190 50 19.5
California 9.0 276 91 40.6
Colorado 7.9 204 78 38.7
Connecticut 3.3 110 77 11.1
Delaware 5.9 238 72 15.8
Florida 15.4 335 80 31.9
Georgia 17.4 211 60 25.8
Hawaii 5.3 46 83 20.2
Idaho 2.6 120 54 14.2
Illinois 10.4 249 83 24.0
Indiana 7.2 113 65 21.0
Iowa 2.2 56 57 11.3
Kansas 6.0 115 66 18.0
Kentucky 9.7 109 52 16.3
Louisiana 15.4 249 66 22.2
Maine 2.1 83 51 7.8
Maryland 11.3 300 67 27.8
Massachusetts 4.4 149 85 16.3
Michigan 12.1 255 74 35.1
Minnesota 2.7 72 66 14.9
Mississippi 16.1 259 44 17.1
Missouri 9.0 178 70 28.2
Montana 6.0 109 53 16.4
Nebraska 4.3 102 62 16.5
Nevada 12.2 252 81 46.0
New Hampshire 2.1 57 56 9.5
New Jersey 7.4 159 89 18.8
New Mexico 11.4 285 70 32.1
New York 11.1 254 86 26.1
North Carolina 13.0 337 45 16.1
North Dakota 0.8 45 44 7.3
Ohio 7.3 120 75 21.4
Oklahoma 6.6 151 68 20.0
Oregon 4.9 159 67 29.3
Pennsylvania 6.3 106 72 14.9
Rhode Island 3.4 174 87 8.3
South Carolina 14.4 279 48 22.5
South Dakota 3.8 86 45 12.8
Tennessee 13.2 188 59 26.9
Texas 12.7 201 80 25.5
Utah 3.2 120 80 22.9
Vermont 2.2 48 32 11.2
Virginia 8.5 156 63 20.7
Washington 4.0 145 73 26.2
West Virginia 5.7 81 39 9.3
Wisconsin 2.6 53 66 10.8
Wyoming 6.8 161 60 15.6
Why is that different?
Upvotes: 1
Views: 1698
Reputation:
This should probably go to Stack Overflow, but I took the data from Google Drive anyway. This should work with you data:
plan0<-read.table("plan0.txt", header=T)
plan0_sna <- na.omit(plan0)
matrix0 <- data.matrix (plan0_sna)
matrix1<-matrix0[,2:ncol(matrix0)]
rownames(matrix1)<-plan0_sna$LOCAL
m2 <- princomp(matrix1, cor=TRUE)
biplot(m2)
When you convert the data frame (plan0) to a matrix (matrix0) factors (column LOCAL) are automatically converted to numbers. Therefore, there is column called LOCAL in matrix0, but it only contains numbers.
Furthermore, there are several rows for each level of LOCAL, so you can't put the LOCAL column as the row names in the data frame (plan0), since data frames do not allow duplicated row names. But, this can be done using a matrix!
So, you can first delete the column LOCAL from you matrix0, and rename the table to matrix1. Then you can assign the row names to matrix1. This would allow you to run the PCA with only numerical data, and get the names on the resulting biplot.
Upvotes: 2