Reputation: 1
I start with a set of 100 genes that are most often found in a certain biological substance, this list is called "top100". Using MERGE I manage to get the counts for each of this 100 proteins out of this dataset for each of the samples. I want to plot the counts for each individual protein, per sample.
So basically I want a plot that shows for instance: protein: PKM and than plots for each of the samples (N=2 in this case) the counts, than I want to repeat this process for all 100 proteins in individual plots.
row.names Gene.Symbol Normalised.count.(B) Normalised.count.(A)
1 1 A2M 46.073855 280.736354
2 5 ACTN4 0.000000 10.436296
3 8 ALDOA 39.354751 61.574145
4 9 ANXA1 1.919744 1.043630
5 13 ANXA5 8.638848 0.000000
6 17 BSG 5.759232 1.043630
7 22 CD81 1.919744 2.087259
8 23 CD9 2.879616 4.174518
9 25 CFL1 5.759232 10.436296
10 26 CLIC1 1.919744 10.436296
This is 1/10 of the total list, so for each gene symbol I want both noramlised count values plotted where
X1 = gene symbol y= normalised.count.(A)
X2= gene symbol y= normalised.count.(B)
This is what I got so far to sort to the final list.
library("openxlsx")
library("dplyr")
library("ggplot2")
library('reshape2')
library('gdata')
protein_report <- read.xlsx(file.choose(), sheet=1)
top100 <- read.xlsx(file.choose(), sheet=1)
norm <- matchcols(protein_report,with = "Norm")
top <- na.omit(merge(top100, protein_report[c("Gene.names",norm)], by.x="Gene.Symbol",by.y="Gene.names", all.x = T, all.y = F))
How to plot these values?
Upvotes: 0
Views: 2205
Reputation: 1721
You could use tidyr and the gather function to first reshape the data into long format and then plot it with ggplot
library(tidyr)
library(ggplot2)
plotData <- protein_report %>% gather(type,Normalised.count,
Normalised.count.A,Normalised.count.B)
ggplot(plotData,aes(x=Gene.Symbol,y=Normalised.count,color=type) +
geom_line() ## For a line plot
Upvotes: 1