Reputation: 545
I have generated a scatter plot of my data using plot(data$pco$li[,1], data$pco$li[,2])
. The result is a PCA scatter output. I now want to colour each point on the scatter according to it's category (each point is a gene and I want to colour it according to the chromosome to which it belongs).
I have a file ready with genes in column one and chromosome in column two, and have loaded it into R using:
geneLoc <- read.table(file = "~/Location/File.txt", header = FALSE, sep = "\t")
colnames(geneLoc) <- c("Gene", "Chromosome")
From here I do not know how to use this information to colour the points on the scatter plot. The closest answer I found was here: Colouring scatter graph by type in r
However, my data for the scatter is not in the form of a two column table (as it is the result of a package called Treescape that conducts PCA). It is therefore in this format:
gene1 gene2 gene3 gene4 gene5 gene6 gene7 gene8 gene9
gene2 33.76389
gene3 51.12729 47.74935
gene4 27.62245 31.38471 52.12485
gene5 33.92639 28.44293 53.74942 28.67054
gene6 32.28002 26.57066 43.72642 29.54657 25.51470
gene7 34.65545 30.08322 54.06478 30.59412 24.89980 27.00000
gene8 31.09662 27.44085 48.89785 27.49545 26.87006 24.59675 26.79552
gene9 36.20773 28.82707 50.94114 31.24100 24.53569 24.06242 25.41653 27.60435
gene10 36.53765 28.75761 53.86093 30.46309 23.62202 25.00000 27.82086 28.87906 25.33772
As such I wouldn't simply be able to add a third category column to a two column data frame and use that to colour my scatter.
Upvotes: 1
Views: 338
Reputation: 8458
You need to convert your data into the following format:
Var1 Var2 Value
gene1 gene2 33.76389
gene1 gene3 51.12729
You can then easily append a 4th column. The package reshape2 has a function called melt, which will do the trick. First, let's generate a similar matrix to your above example:
mydata <- matrix(data=rnorm(81, 25, 10), ncol=9, nrow=9)
colnames(mydata) <- paste0("gene", 1:9)
rownames(mydata) <- paste0("gene", 2:10)
mydata[upper.tri(mydata, diag=T)] <- NA
Now we can use reshape2 to turn this into "long" format I described above:
library(reshape2)
meltdata <- melt(mydata)
You can now append a column to the right of meltdata for plotting. The ggplot2 library is good at plotting data structured in this format.
Upvotes: 1