Reputation: 58
I'm having trouble with this PCA. PC1 results appear binary, and I can't figure out why as none of my variables are binary.
df = bees
pca_dat_condition <- bees %>% ungroup() %>%
select(Length.1:Length.25, OBJECTID, Local, Elevation, Longitude,
Latitude, Cubital.Index) %>%
na.omit()
pca_dat_first <- pca_dat_condition %>% #remove the final nonnumerical information
select(-Local, -OBJECTID, -Elevation, -Longitude, -Latitude)
pca <- pca_dat_first%>%
scale() %>%
prcomp()
# add identifying information back into PCA data
pca_data <- data.frame(pca$x, Local=pca_dat_condition$Local, ID =
pca_dat_condition$OBJECTID, elevation = pca_dat_condition$Elevation,
Longitude = pca_dat_condition$Longitude, Latitude =
pca_dat_condition$Latitude)
ggplot(pca_data, aes(x=PC1, y=PC2, color = Latitude)) +
geom_point() +ggtitle("PC1 vs PC2: All Individuals") +
scale_colour_gradient(low = "blue", high = "red")
I'm not getting any error messages with the code, and when I look at the data frame nothing looks out of place. Should I be using a different function for the PCA? Any insight into why my graph may look like this?
Previously, I did the same PCA but for the average values for each Local (whereas this is each individual), and it came out as a normal PCA with no clear clustering. I don't understand why this problem would arise when looking at individual points. It's possible I merged some other data frames in a wonky way, but the structure of the dataset seems completely normal.
Upvotes: 1
Views: 86
Reputation: 8837
bees <- read.csv(paste0("https://gist.githubusercontent.com/AkselA/",
"08a4e78a6a29a918ed597e9a32adc228/raw/",
"6d0005fad4cb91830bcf7087176283b18683e9cd/bees.csv"),
header=TRUE)
# bees <- bees[bees[,1] < 10,] # This will remove the three offending rows
bees <- na.omit(bees)
bees.cond <- bees[, grep("Length|OBJ|Loc|Ele|Lon|Lat|Cubi", colnames(bees))]
bees.first <- bees[, grep("Length|Cubi", colnames(bees))]
summary(bees.first)
par(mfrow=c(7, 4), mar=rep(1, 4))
q <- lapply(1:ncol(bees.first), function(x) {
h <- hist(scale(bees.first[, x]), plot=FALSE)
h$counts <- log1p(h$counts)
plot(h, main="", axes=FALSE, ann=FALSE)
legend("topright", legend=names(bees.first[x]),
bty="n", cex=0.8, adj=c(0, -2), xpd=NA)
})
bees.pca <- prcomp(bees.first, scale.=TRUE)
biplot(bees.pca)
Upvotes: 2