Reputation: 53
I have run Moran's I analysis which looks for spatial relationships among features. The analysis was done using the correlog function in the ncf R package and used the first 3 principal components generated from genetic data. the results of that analysis are shown below.
distance=c(2.806063,8.208133,14.03604,19.03151,24.44091, 2.806063, 8.208133,14.03604,19.03151,24.44091,2.806063,8.208133,14.03604,19.03151,24.44091 )
correlation=c(-0.006933,0.029481,-0.071406,0.038319,-0.049990,0.006267,0.055945,-0.048551,-0.035062,-0.031578,0.022629,-0.065584,0.000986,-0.052754,0.0424931)
component=c(PC1,PC1,PC1,PC1,PC1,PC2,PC2,PC2,PC2,PC2,PC3,PC3,PC3,PC3,PC3)
data1<-data.frame(distance,correlation,component)
I then used ggplot to plot the results
library(ggplot2)
ggplot(data1,aes(x=data1$distance,y=data1$correlation,group=component,colour=component))+theme_classic()+ geom_line(size=1)+geom_point(size=1.5)
What I would now like to do is compute the 95% confidence intervals for each of the principal components, and draw that on the ggplots, using a faint shading for the confidence area around each line and keeping the different line colours representing the different PCs. Unfortunately, I am completely stuck and don't know how to go about doing this. Any help will be higly appreciated.
Upvotes: 0
Views: 1446
Reputation: 59355
You code doesn't run as is, which is why no one has bothered to respond for the last 10 hours.
Assuming you mean:
component=c("PC1","PC1","PC1","PC1","PC1","PC2","PC2","PC2","PC2","PC2","PC3","PC3","PC3","PC3","PC3")
and that you want the 95% CL for the correlation vs. distance, this will provide it:
library(ggplot2)
ggplot(data1,aes(x=distance,y=correlation,color=component))+
geom_line(size=1)+
geom_point(size=1.5)+
stat_smooth(aes(fill=component), alpha=.2,
method=lm, formula=y~1, se=TRUE, level=0.95)+
theme_classic()
The main addition is the stat_smooth(...)
line, which smooths the correlation vs. distance data using a linear model having only the constant term (so, the mean). Note that the default level=0.95
and the default se=TRUE
so those clauses are not really necessary in this case.
Also, the expressions in the call to aes(...)
should reference columns of the data1 (so x=distance
, not x=data1$distance
), and you do not need the group=...
clause if color=...
uses the same grouping variable.
Upvotes: 1