Kelly Edwards
Kelly Edwards

Reputation: 1

How to perform a PCA by reading in ONLY the columns of a dataset that have numeric data?

I am trying to do a PCA of monthly temperatures, but I am given a dataset that has more columns than just the monthly data. How do I only read in the month columns to perform the PCA? Here is everything I have so far:

dat_TEMP=read.table("TEMPERATURE.csv",header=TRUE, sep=";", dec=",",row.names=1)
attach(dat_TEMP)
df=data.frame(January,February,March,April,May,June,July,August,September,October,November,December)
dat.pca=prcomp(df,dat_TEMP,center=T,scale=T)

but when I try to run that last line it gives me this error: "Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric"

Can anyone help me with this? What do I need to do to just read out the month columns?

Upvotes: 0

Views: 1338

Answers (1)

Learner_seeker
Learner_seeker

Reputation: 544

You need to make sure that in extraction your numeric columns arent passed as character or factors. If not , you can then subset the data with numeric columns and then run PCA.

There are multiple ways you can subset the data with only numeric columns .

using select_if() from dplyr

library("dplyr")
data.numeric=select_if(data, is.numeric)

using apply functions

colnums <- sapply(data, is.numeric)
data[ , colnums]

Alternatively

data[, sapply(data, class) == "numeric"]

Upvotes: 2

Related Questions