Reputation: 1069
I have a matrix with samples as rows and Genes as columns with gene expression values (RPKM).
Following is an Example data. Original data has more than 800 samples.
LINP1 EGFR RB1 TP53 CDKN2A MYC
Sample1 0.02 0.038798682 0.1423662 2.778587067 0.471403939 18.93687655
Sample2 0 0.059227225 0.208765213 0.818810739 0.353671882 1.379027685
Sample3 0 0.052116384 0.230437735 2.535040249 0.504061015 9.773089223
Sample4 0.06 0.199264618 0.261100548 2.516963635 0.63659138 11.01441624
Sample5 0 0.123521916 0.273330986 2.751309388 0.623572499 34.0563519
Sample6 0 0.128767634 0.263491811 2.882878373 0.359322715 13.02402045
Sample7 0 0.080097356 0.234511372 3.568192768 0.386217698 9.068928569
Sample8 0 0.017421323 0.247775683 5.109428797 0.068760572 15.7490551
Sample9 0 2.10281137 0.401582013 8.202902242 0.140596724 60.25989178
To make a scatter plot showing correlation between two genes I used ggscatter
ggscatter(A2, x = "LINP1", y = "EGFR",
add = "reg.line", conf.int = FALSE,
cor.coef = TRUE, cor.method = "pearson",
xlab = "LINP1", ylab = "EGFR", xscale="log2", yscale="log2")
The scatter plot looks like this
And I want to make a scatter plot like this
Fig 2g in this Research paper. where LINP1 expression is showed against all other genes in a single plot. Is it possible with any code?
Upvotes: 4
Views: 2186
Reputation: 1314
As you are doing pearson correlations, the results are the same as if you did a scatterplot and plotted a fitted line from a regression model which can be accomplished in ggplot2::geom_smooth()
in conjunction with your scatterplot by genes.
Edit:
Updated to use a log2() transformation on both scales per the OP's comment. Note that when doing transformations you can sometimes get invalid values. Your data has 0
s so a log2() transformation returns -Inf
:
library(tidyr)
library(ggplot2)
df <- read.table(text = "
LINP1 EGFR RB1 TP53 CDKN2A MYC
Sample1 0.02 0.038798682 0.1423662 2.778587067 0.471403939 18.93687655
Sample2 0 0.059227225 0.208765213 0.818810739 0.353671882 1.379027685
Sample3 0 0.052116384 0.230437735 2.535040249 0.504061015 9.773089223
Sample4 0.06 0.199264618 0.261100548 2.516963635 0.63659138 11.01441624
Sample5 0 0.123521916 0.273330986 2.751309388 0.623572499 34.0563519
Sample6 0 0.128767634 0.263491811 2.882878373 0.359322715 13.02402045
Sample7 0 0.080097356 0.234511372 3.568192768 0.386217698 9.068928569
Sample8 0 0.017421323 0.247775683 5.109428797 0.068760572 15.7490551
Sample9 0 2.10281137 0.401582013 8.202902242 0.140596724 60.25989178", header = TRUE)
df %>%
gather(key = variable, value = values, EGFR:MYC) %>%
ggplot(aes(LINP1, values)) +
geom_point() +
facet_grid(. ~ variable, scales = "free_x") +
geom_smooth(method = "lm", se = FALSE) +
scale_y_continuous(trans = "log2") +
scale_x_continuous(trans = "log2")
Upvotes: 3