How to make a scatterplot showing correlation between a single gene vs multiple genes?

Question

I have a matrix with samples as rows and Genes as columns with gene expression values (RPKM).

Following is an Example data. Original data has more than 800 samples.

        LINP1   EGFR            RB1       TP53         CDKN2A      MYC
Sample1 0.02   0.038798682  0.1423662   2.778587067 0.471403939 18.93687655
Sample2 0      0.059227225  0.208765213 0.818810739 0.353671882 1.379027685
Sample3 0      0.052116384  0.230437735 2.535040249 0.504061015 9.773089223
Sample4 0.06   0.199264618  0.261100548 2.516963635 0.63659138  11.01441624
Sample5 0      0.123521916  0.273330986 2.751309388 0.623572499 34.0563519
Sample6 0      0.128767634  0.263491811 2.882878373 0.359322715 13.02402045
Sample7 0      0.080097356  0.234511372 3.568192768 0.386217698 9.068928569
Sample8 0      0.017421323  0.247775683 5.109428797 0.068760572 15.7490551
Sample9 0      2.10281137   0.401582013 8.202902242 0.140596724 60.25989178

To make a scatter plot showing correlation between two genes I used ggscatter

ggscatter(A2, x = "LINP1", y = "EGFR", 
          add = "reg.line", conf.int = FALSE, 
          cor.coef = TRUE, cor.method = "pearson",
          xlab = "LINP1", ylab = "EGFR", xscale="log2", yscale="log2")

The scatter plot looks like this

scatterplot

And I want to make a scatter plot like this

scatterplot

Fig 2g in this Research paper. where LINP1 expression is showed against all other genes in a single plot. Is it possible with any code?

MHammer · Accepted Answer

As you are doing pearson correlations, the results are the same as if you did a scatterplot and plotted a fitted line from a regression model which can be accomplished in ggplot2::geom_smooth() in conjunction with your scatterplot by genes.

Edit: Updated to use a log2() transformation on both scales per the OP's comment. Note that when doing transformations you can sometimes get invalid values. Your data has 0s so a log2() transformation returns -Inf:

library(tidyr)
library(ggplot2)

df <- read.table(text = "
LINP1   EGFR            RB1       TP53         CDKN2A      MYC
Sample1 0.02   0.038798682  0.1423662   2.778587067 0.471403939 18.93687655
Sample2 0      0.059227225  0.208765213 0.818810739 0.353671882 1.379027685
Sample3 0      0.052116384  0.230437735 2.535040249 0.504061015 9.773089223
Sample4 0.06   0.199264618  0.261100548 2.516963635 0.63659138  11.01441624
Sample5 0      0.123521916  0.273330986 2.751309388 0.623572499 34.0563519
Sample6 0      0.128767634  0.263491811 2.882878373 0.359322715 13.02402045
Sample7 0      0.080097356  0.234511372 3.568192768 0.386217698 9.068928569
Sample8 0      0.017421323  0.247775683 5.109428797 0.068760572 15.7490551
Sample9 0      2.10281137   0.401582013 8.202902242 0.140596724 60.25989178", header = TRUE)



df %>% 
  gather(key = variable, value = values, EGFR:MYC) %>% 
  ggplot(aes(LINP1, values)) + 
  geom_point() + 
  facet_grid(. ~ variable, scales = "free_x") + 
  geom_smooth(method = "lm", se = FALSE) + 
  scale_y_continuous(trans = "log2") + 
  scale_x_continuous(trans = "log2")

How to make a scatterplot showing correlation between a single gene vs multiple genes?

Answers (1)

Related Questions