How to fix "Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric" for PCA - prcomp?

Question

So, I encountered this error trying to run the PCA via prcomp function on one of my datasets.

So the code I use is:

data(iris)
myPr <- prcomp(iris[, -5], scale = TRUE)
PCA <- cbind(iris, myPr$x)

then a ggplot2 part for the graph. So, in this example, Iris is a data.frame (class) with 4 numerical columns and a 5th character column.

Sepal.Length	Sepal.Width	Petal.Length	Petal.Width	Species
Min. :4.300	Min. :2.000	Min. :1.000	Min. :0.100	setosa :50
1st Qu.:5.100	1st Qu.:2.800	1st Qu.:1.600	1st Qu.:0.300	versicolor:50
Median :5.800	Median :3.000	Median :4.350	Median :1.300	virginica :50
Mean :5.843	Mean :3.057	Mean :3.758	Mean :1.199
3rd Qu.:6.400	3rd Qu.:3.300	3rd Qu.:5.100	3rd Qu.:1.800
Max. :7.900	Max. :4.400	Max. :6.900	Max. :2.500

I take the 5th out for the prcomp (as expected) and it works just fine. But then I tried using another dataset, I do the same conversions exept for the scale (as it is not needed) and have to remove more columns (columns 1-6, which are character - categorical variables). The code in question is as follows:

DATASET_PCA_MERGED <- read_xlsx ("C:/Users/i5/Desktop/TCGA MERGED Desktop.xlsx")
PCA_Input <- (DATASET_PCA_MERGED[, -c(1,2,3,4,5,6)])
myPr <- prcomp(PCA_Input)

Note that checking the class of PCA input you get the data.frame same as iris in the example/test The prcomp in ths case leads to:

Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric

class(PCA_Input) [1] "tbl_df" "tbl" "data.frame"

All columns are set to character even though on excel they are numeric (tested importing in CSV too, same issue).

So I tried to convert to numeric in several ways following different posts here, but all of them required the object to be unlisted, which can't be done if Im going to run a PCA afterwards, as I need to keep the structure of this dataframe. Can someone help me on how could I convert columns to numeric while keeping the data frame structure?

Going even further on another approach, I used the raw data in .txt

DATASET <- read.delim("C:/Users/i5/Dropbox/Guilherme Vergara/Doutorado/Data/Datasets/TCGA LGG/EXP_DATA.txt")
na.omit(LGG_EXP)
PCA <- LGG_EXP[, -c(1,2)]
myPr <- prcomp(PCA)

And then I get a new error: Error in svd(x, nu=0, nv=k) : Infinite or missing values in 'x'

looking through other posts here, I tried to: all(is.finite(unlist(PCA))) [1] FALSE

So, I have some infinite values in this dataset. Not sure how to proceed here - either locate them for removal or another approach

> sapply(PCA, "is.infinite"(PCA))
Error in is.infinite(PCA) : 
  default method not implemented for type 'list'

> sapply(PCA, "is.infinite"(unlist(PCA)))
Error in match.fun(FUN) : 
  'is.infinite(unlist(PCA))' is not a function, character or symbol

I didn't go any further from this as I'm not sure what the problem is and Im clearing not using 'sapply' function correctly. In addition, I'd like to solve it without the need of getting access to the .txt files (as this will be a recurrent problem in my line of work). Can someone please try to help me with this?

Thanks in advance

How to fix "Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric" for PCA - prcomp?

Answers (0)

Related Questions

How to fix &quot;Error in colMeans(x, na.rm = TRUE) : &#39;x&#39; must be numeric&quot; for PCA - prcomp?

Answers (0)

Related Questions

How to fix "Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric" for PCA - prcomp?