Labeling x-axis with another column from dataframe

Question

I have a dataframe derived from the output of running GWAS. Each row is a SNP in the genome, with its Chromosome, Position, and P.value. From this dataframe, I'd like to generate a Manhattan Plot where the x-axis goes from the first SNP on Chr 1 to the last SNP on Chr 5 and the y-axis is the -log10(P.value). To do this, I generated an Index column to plot the SNPs in the correct order along the x-axis, however, I would like the x-axis to be labeled by the Chromosome column instead of the Index. Unfortunately, I cannot use Chromosome to plot my x-axis because then all the SNPs on any given Chromosome would be plotted in a single column of points.

Here is an example dataframe to work with:

library(tidyverse)

df <- tibble(Index = seq(1, 500, by = 1),
             Chromosome = rep(seq(1, 5, by = 1), each = 100),
             Position = rep(seq(1, 500, by = 5), 5),
             P.value = sample(seq(1e-5, 1e-2, by = 1e-5), 500, replace = TRUE))

And the plot that I have so far:

df %>%
    ggplot(aes(x = Index, y = -log10(P.value), color = as.factor(Chromosome))) +
    geom_point()

I have tried playing around with the scale_x_discrete option, but haven't been able to figure out a solution.

Here is an example of a Manhattan Plot I found online. See how the x-axis is labeled according to the Chromosome? That is my desired output.

Example Manhattan Plot

pedrostrusso · Accepted Answer

geom_jitter is your friend:

df %>%
    ggplot(aes(x = Chromosome, y = -log10(P.value), color = as.factor(Chromosome))) +
    geom_jitter()

Edit given OP's comment:

Using base R plot, you could do:

cols = sample(colors(), length(unique(df$Chromosome)))[df$Chromosome]

plot(df$Index, -log10(df$P.value), col=cols, xaxt="n")
axis(1, at=c(50, 150, 250, 350, 450), labels=c(1:5))

You'll need to specify exactly where you want each chromosome label to be for the axis function. Thanks to this post.

Edit #2:

I found an answer using ggplot2. You can use the annotate function to plot your points by coordinates, and the scale_x_discrete function (as you suggested) to place the labels in the x axis according to chromosome. We also need to define the pos vector to get the position of labels for the plot. I used the mean value of the Index column for each group as an example, but you can define it by hand if you wish.

pos <- df %>% 
    group_by(Chromosome) %>% 
    summarize(avg = round(mean(Index))) %>% 
    pull(avg)

ggplot(df) +
    annotate("point", x=df$Index, y=-log10(df$P.value),
          color=as.factor(df$Chromosome)) +
    scale_x_discrete(limits = pos, 
          labels = unique(df$Chromosome))

Labeling x-axis with another column from dataframe

Answers (1)

Related Questions