doc elfein
doc elfein

Reputation: 27

Problems Plotting PCA in R with ggplot2

I am currently trying to plot a PCA for my data and when I run the code and have the following issues.

And furthermore, can anyone help take my data and code and produce a PLS-DA? like as in the picture? I couldn't find any good tutorials.

How can I resolve this Issue? The plots should look like:

Text

So after some help I got this far:

my code:


    library(ggplot2)
library(ggforce)

all_datanoT <- cbind(amino,sphingo,hexose,phospha,lyso,cleaned_xl_Kopie)
all_datawT <- cbind(aminotnos,sphingo,hexose,phospha,lyso,cleaned_xl_Kopie)
rownames(all_datawT) <- sample_id$`Sample Identification`


alldata_naomit <-na.omit(all_datanoT)
all_datawTnaomit <-na.omit(all_datawT)

mypr <- prcomp(log2(alldata_naomit), scale = TRUE)
summary(mypr)

str(mypr)
mypr$x


PC1 <- mypr$x[, 1]
PC2 <- mypr$x[, 2]
pcat <- cbind(all_datawTnaomit, PC1, PC2)



ggplot(  
  data = pcat,
  aes(
    x = PC1,
    y = PC2,
    fill = 'Time point',
    line = 1
  ),
  shape = 1
) +
  geom_point(
    shape = 21,
    colour = "black",
    size = 2,
    stroke = 0.5,
    alpha = 0.6
  ) +
  scale_fill_brewer(palette = "Set1") +
  scale_color_brewer(palette = "Set1") +
  geom_mark_ellipse(
    aes(
      fill = 'Time point',
      color = 'Time point'
    ),
    alpha = 0.05
  ) 

which produces the following plot:

Text

How can I get it to use the two different Time values for two ellipses T0 and T1? and How can I easily Impute my data so the Na's are replaced by the column means for example instead of ommiting them just so I can plot ?

original Sample Data with dput()

dput(pcat[sample(nrow(pcat),50)])

https://gist.github.com/bicvn/47d97929a63ff99e9b260e8658407ae3

new dput

https://gist.github.com/bicvn/b06279c6bfa641303b57a3ad2cc07a21

Upvotes: 0

Views: 1413

Answers (2)

Duck
Duck

Reputation: 39613

Also check this, here I included an example. The trick use Comps <- as.data.frame(mypca$x) to isolate the components and then add to original data. After that you can use cbind() with Comps[,c(1,2)] to only extract the first two components. Here, I used iris dataset:

library(ggplot2)
library(ggforce)
#Data
data("iris")
#PCA
mypca <- prcomp(iris[,-5])
#Isolate components
Comps <- as.data.frame(mypca$x)
#Extract components and bind to original data
newiris <- cbind(iris,Comps[,c(1,2)])
#Plot
ggplot(newiris, aes(x=PC1, y=PC2, col = Species, fill = Species)) +
  stat_ellipse(geom = "polygon", col= "black", alpha =0.5)+
  geom_point(shape=21, col="black")

Output:

enter image description here

In the case of data shared, only do not apply the NA action. Here the code and output with the data you shared:

#Code
ggplot(pcat, aes(x=PC1, y=PC2, col = `Time point`, fill = `Time point`)) +
  stat_ellipse(geom = "polygon", col= "black", alpha =0.5)+
  geom_point(shape=21, col="black")

Output:

enter image description here

Upvotes: 2

dcarlson
dcarlson

Reputation: 11076

There seem to be discrepancies between your code and your output:

pcat <- cbind(all_datawT, mypr$x[, 1:2])

adds the first two columns of mypr$x to the data frame. But the output shows:

mypr$x[1:2]

which is the first two values of the matrix x. If you look at the column, you will see that those two values are repeated down the data. In R this is recycling and it is the default procedure when cbind is used to combine vectors that are of different lengths.

The variables PC1 and PC2 are not found because you never created any object with those values, e.g.

PC1 <- mypr$x[, 1]
PC2 <- mypr$x[, 2]
pcat <- cbind(all_datawT, PC1, PC2)

That should work.

Upvotes: 1

Related Questions