Reputation: 11
So, I encountered this error trying to run the PCA via prcomp function on one of my datasets.
So the code I use is:
data(iris)
myPr <- prcomp(iris[, -5], scale = TRUE)
PCA <- cbind(iris, myPr$x)
then a ggplot2 part for the graph. So, in this example, Iris is a data.frame (class) with 4 numerical columns and a 5th character column.
Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
---|---|---|---|---|
Min. :4.300 | Min. :2.000 | Min. :1.000 | Min. :0.100 | setosa :50 |
1st Qu.:5.100 | 1st Qu.:2.800 | 1st Qu.:1.600 | 1st Qu.:0.300 | versicolor:50 |
Median :5.800 | Median :3.000 | Median :4.350 | Median :1.300 | virginica :50 |
Mean :5.843 | Mean :3.057 | Mean :3.758 | Mean :1.199 | |
3rd Qu.:6.400 | 3rd Qu.:3.300 | 3rd Qu.:5.100 | 3rd Qu.:1.800 | |
Max. :7.900 | Max. :4.400 | Max. :6.900 | Max. :2.500 |
I take the 5th out for the prcomp (as expected) and it works just fine. But then I tried using another dataset, I do the same conversions exept for the scale (as it is not needed) and have to remove more columns (columns 1-6, which are character - categorical variables). The code in question is as follows:
DATASET_PCA_MERGED <- read_xlsx ("C:/Users/i5/Desktop/TCGA MERGED Desktop.xlsx")
PCA_Input <- (DATASET_PCA_MERGED[, -c(1,2,3,4,5,6)])
myPr <- prcomp(PCA_Input)
Note that checking the class of PCA input you get the data.frame same as iris in the example/test The prcomp in ths case leads to:
Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric
class(PCA_Input) [1] "tbl_df" "tbl" "data.frame"
All columns are set to character even though on excel they are numeric (tested importing in CSV too, same issue).
So I tried to convert to numeric in several ways following different posts here, but all of them required the object to be unlisted, which can't be done if Im going to run a PCA afterwards, as I need to keep the structure of this dataframe. Can someone help me on how could I convert columns to numeric while keeping the data frame structure?
Going even further on another approach, I used the raw data in .txt
DATASET <- read.delim("C:/Users/i5/Dropbox/Guilherme Vergara/Doutorado/Data/Datasets/TCGA LGG/EXP_DATA.txt")
na.omit(LGG_EXP)
PCA <- LGG_EXP[, -c(1,2)]
myPr <- prcomp(PCA)
And then I get a new error: Error in svd(x, nu=0, nv=k) : Infinite or missing values in 'x'
looking through other posts here, I tried to: all(is.finite(unlist(PCA))) [1] FALSE
So, I have some infinite values in this dataset. Not sure how to proceed here - either locate them for removal or another approach
> sapply(PCA, "is.infinite"(PCA))
Error in is.infinite(PCA) :
default method not implemented for type 'list'
> sapply(PCA, "is.infinite"(unlist(PCA)))
Error in match.fun(FUN) :
'is.infinite(unlist(PCA))' is not a function, character or symbol
I didn't go any further from this as I'm not sure what the problem is and Im clearing not using 'sapply' function correctly. In addition, I'd like to solve it without the need of getting access to the .txt files (as this will be a recurrent problem in my line of work). Can someone please try to help me with this?
Thanks in advance
Upvotes: 1
Views: 189