Reputation: 27
I am currently trying to plot a PCA for my data and when I run the code and have the following issues.
And furthermore, can anyone help take my data and code and produce a PLS-DA? like as in the picture? I couldn't find any good tutorials.
How can I resolve this Issue? The plots should look like:
So after some help I got this far:
my code:
library(ggplot2)
library(ggforce)
all_datanoT <- cbind(amino,sphingo,hexose,phospha,lyso,cleaned_xl_Kopie)
all_datawT <- cbind(aminotnos,sphingo,hexose,phospha,lyso,cleaned_xl_Kopie)
rownames(all_datawT) <- sample_id$`Sample Identification`
alldata_naomit <-na.omit(all_datanoT)
all_datawTnaomit <-na.omit(all_datawT)
mypr <- prcomp(log2(alldata_naomit), scale = TRUE)
summary(mypr)
str(mypr)
mypr$x
PC1 <- mypr$x[, 1]
PC2 <- mypr$x[, 2]
pcat <- cbind(all_datawTnaomit, PC1, PC2)
ggplot(
data = pcat,
aes(
x = PC1,
y = PC2,
fill = 'Time point',
line = 1
),
shape = 1
) +
geom_point(
shape = 21,
colour = "black",
size = 2,
stroke = 0.5,
alpha = 0.6
) +
scale_fill_brewer(palette = "Set1") +
scale_color_brewer(palette = "Set1") +
geom_mark_ellipse(
aes(
fill = 'Time point',
color = 'Time point'
),
alpha = 0.05
)
which produces the following plot:
How can I get it to use the two different Time values for two ellipses T0 and T1? and How can I easily Impute my data so the Na's are replaced by the column means for example instead of ommiting them just so I can plot ?
original Sample Data with dput()
dput(pcat[sample(nrow(pcat),50)])
https://gist.github.com/bicvn/47d97929a63ff99e9b260e8658407ae3
new dput
https://gist.github.com/bicvn/b06279c6bfa641303b57a3ad2cc07a21
Upvotes: 0
Views: 1413
Reputation: 39613
Also check this, here I included an example. The trick use Comps <- as.data.frame(mypca$x)
to isolate the components and then add to original data. After that you can use cbind()
with Comps[,c(1,2)]
to only extract the first two components. Here, I used iris
dataset:
library(ggplot2)
library(ggforce)
#Data
data("iris")
#PCA
mypca <- prcomp(iris[,-5])
#Isolate components
Comps <- as.data.frame(mypca$x)
#Extract components and bind to original data
newiris <- cbind(iris,Comps[,c(1,2)])
#Plot
ggplot(newiris, aes(x=PC1, y=PC2, col = Species, fill = Species)) +
stat_ellipse(geom = "polygon", col= "black", alpha =0.5)+
geom_point(shape=21, col="black")
Output:
In the case of data shared, only do not apply the NA action. Here the code and output with the data you shared:
#Code
ggplot(pcat, aes(x=PC1, y=PC2, col = `Time point`, fill = `Time point`)) +
stat_ellipse(geom = "polygon", col= "black", alpha =0.5)+
geom_point(shape=21, col="black")
Output:
Upvotes: 2
Reputation: 11076
There seem to be discrepancies between your code and your output:
pcat <- cbind(all_datawT, mypr$x[, 1:2])
adds the first two columns of mypr$x to the data frame. But the output shows:
mypr$x[1:2]
which is the first two values of the matrix x. If you look at the column, you will see that those two values are repeated down the data. In R this is recycling and it is the default procedure when cbind
is used to combine vectors that are of different lengths.
The variables PC1
and PC2
are not found because you never created any object with those values, e.g.
PC1 <- mypr$x[, 1]
PC2 <- mypr$x[, 2]
pcat <- cbind(all_datawT, PC1, PC2)
That should work.
Upvotes: 1