Letin
Letin

Reputation: 1275

How to add additional values to x-axis based on data column values?

I am working on scatter plots for multiple genes using a loop. Multiple png file are produced for each gene. Each gene / png file contains two scatter plots: Group1 on left and Group2 on right. Each group contains both healthy and unhealthy samples. I have managed to successfully derive the code up till here.

However, what I need to do now is to add sample numbers in the x-axis per 'time point' for each healthy and unhealthy group. This is based on the 'samples' column. This should be present as "(number of samples in healthy condition, number of samples in unhealthy condition)" for each time point. Can anyone help me with achieving that?

My current example data frame 'data' for 2 genes is as follows:

Biomarkers  TimePoint   Group   Scale   Readings    Condition   samples
Gene1   52.5    Group1  25  0.027   Healthy 33
Gene1   52.5    Group2  25  0.024   Healthy 35
Gene1   57.5    Group1  25  0.029   Healthy 39
Gene1   57.5    Group2  25  0.023   Healthy 46
Gene1   62.5    Group1  25  0.030   Healthy 38
Gene1   62.5    Group2  25  0.024   Healthy 42
Gene1   67.5    Group1  25  0.033   Healthy 23
Gene1   67.5    Group2  25  0.026   Healthy 16
Gene2   52.5    Group1  25  0.051   Healthy 33
Gene2   52.5    Group2  25  0.046   Healthy 35
Gene2   57.5    Group1  25  0.052   Healthy 39
Gene2   57.5    Group2  25  0.048   Healthy 46
Gene2   62.5    Group1  25  0.049   Healthy 38
Gene2   62.5    Group2  25  0.051   Healthy 42
Gene2   67.5    Group1  25  0.051   Healthy 23
Gene2   67.5    Group2  25  0.052   Healthy 16
Gene1   52.5    Group1  25.01   0.026   Unhealthy   41
Gene1   52.5    Group2  25.01   0.023   Unhealthy   57
Gene1   57.5    Group1  25.01   0.027   Unhealthy   79
Gene1   57.5    Group2  25.01   0.024   Unhealthy   70
Gene1   62.5    Group1  25.01   0.030   Unhealthy   93
Gene1   62.5    Group2  25.01   0.025   Unhealthy   84
Gene1   67.5    Group1  25.01   0.033   Unhealthy   98
Gene1   67.5    Group2  25.01   0.022   Unhealthy   64
Gene2   52.5    Group1  25.01   0.043   Unhealthy   36
Gene2   52.5    Group2  25.01   0.044   Unhealthy   57
Gene2   57.5    Group1  25.01   0.043   Unhealthy   79
Gene2   57.5    Group2  25.01   0.043   Unhealthy   70
Gene2   62.5    Group1  25.01   0.043   Unhealthy   93
Gene2   62.5    Group2  25.01   0.044   Unhealthy   84
Gene2   67.5    Group1  25.01   0.044   Unhealthy   98
Gene2   67.5    Group2  25.01   0.044   Unhealthy   64
Gene1   52.5    Group1  50  0.035   Healthy 33
Gene1   52.5    Group2  50  0.029   Healthy 35
Gene1   57.5    Group1  50  0.039   Healthy 39
Gene1   57.5    Group2  50  0.031   Healthy 46
Gene1   62.5    Group1  50  0.038   Healthy 38
Gene1   62.5    Group2  50  0.030   Healthy 42
Gene1   67.5    Group1  50  0.040   Healthy 23
Gene1   67.5    Group2  50  0.035   Healthy 16
Gene2   52.5    Group1  50  0.058   Healthy 33
Gene2   52.5    Group2  50  0.053   Healthy 35
Gene2   57.5    Group1  50  0.059   Healthy 39
Gene2   57.5    Group2  50  0.056   Healthy 46
Gene2   62.5    Group1  50  0.057   Healthy 38
Gene2   62.5    Group2  50  0.058   Healthy 42
Gene2   67.5    Group1  50  0.061   Healthy 23
Gene2   67.5    Group2  50  0.058   Healthy 16
Gene1   52.5    Group1  50.01   0.038   Unhealthy   41
Gene1   52.5    Group2  50.01   0.030   Unhealthy   57
Gene1   57.5    Group1  50.01   0.038   Unhealthy   79
Gene1   57.5    Group2  50.01   0.031   Unhealthy   70
Gene1   62.5    Group1  50.01   0.040   Unhealthy   93
Gene1   62.5    Group2  50.01   0.032   Unhealthy   84
Gene1   67.5    Group1  50.01   0.043   Unhealthy   98
Gene1   67.5    Group2  50.01   0.033   Unhealthy   64
Gene2   52.5    Group1  50.01   0.052   Unhealthy   36
Gene2   52.5    Group2  50.01   0.051   Unhealthy   57
Gene2   57.5    Group1  50.01   0.052   Unhealthy   79
Gene2   57.5    Group2  50.01   0.051   Unhealthy   70
Gene2   62.5    Group1  50.01   0.052   Unhealthy   93
Gene2   62.5    Group2  50.01   0.052   Unhealthy   84
Gene2   67.5    Group1  50.01   0.053   Unhealthy   98
Gene2   67.5    Group2  50.01   0.051   Unhealthy   64
Gene1   52.5    Group1  75  0.045   Healthy 33
Gene1   52.5    Group2  75  0.038   Healthy 35
Gene1   57.5    Group1  75  0.048   Healthy 39
Gene1   57.5    Group2  75  0.041   Healthy 46
Gene1   62.5    Group1  75  0.047   Healthy 38
Gene1   62.5    Group2  75  0.040   Healthy 42
Gene1   67.5    Group1  75  0.050   Healthy 23
Gene1   67.5    Group2  75  0.043   Healthy 16
Gene2   52.5    Group1  75  0.066   Healthy 33
Gene2   52.5    Group2  75  0.064   Healthy 35
Gene2   57.5    Group1  75  0.065   Healthy 39
Gene2   57.5    Group2  75  0.064   Healthy 46
Gene2   62.5    Group1  75  0.068   Healthy 38
Gene2   62.5    Group2  75  0.071   Healthy 42
Gene2   67.5    Group1  75  0.070   Healthy 23
Gene2   67.5    Group2  75  0.071   Healthy 16
Gene1   52.5    Group1  75.01   0.057   Unhealthy   41
Gene1   52.5    Group2  75.01   0.041   Unhealthy   57
Gene1   57.5    Group1  75.01   0.056   Unhealthy   79
Gene1   57.5    Group2  75.01   0.040   Unhealthy   70
Gene1   62.5    Group1  75.01   0.057   Unhealthy   93
Gene1   62.5    Group2  75.01   0.043   Unhealthy   84
Gene1   67.5    Group1  75.01   0.059   Unhealthy   98
Gene1   67.5    Group2  75.01   0.046   Unhealthy   64
Gene2   52.5    Group1  75.01   0.063   Unhealthy   36
Gene2   52.5    Group2  75.01   0.060   Unhealthy   57
Gene2   57.5    Group1  75.01   0.061   Unhealthy   79
Gene2   57.5    Group2  75.01   0.062   Unhealthy   70
Gene2   62.5    Group1  75.01   0.062   Unhealthy   93
Gene2   62.5    Group2  75.01   0.062   Unhealthy   84
Gene2   67.5    Group1  75.01   0.061   Unhealthy   98
Gene2   67.5    Group2  75.01   0.060   Unhealthy   64
Gene1   52.5    Group1  100 0.056   Healthy 33
Gene1   52.5    Group2  100 0.046   Healthy 35
Gene1   57.5    Group1  100 0.063   Healthy 39
Gene1   57.5    Group2  100 0.048   Healthy 46
Gene1   62.5    Group1  100 0.060   Healthy 38
Gene1   62.5    Group2  100 0.052   Healthy 42
Gene1   67.5    Group1  100 0.064   Healthy 23
Gene1   67.5    Group2  100 0.055   Healthy 16
Gene2   52.5    Group1  100 0.082   Healthy 33
Gene2   52.5    Group2  100 0.074   Healthy 35
Gene2   57.5    Group1  100 0.070   Healthy 39
Gene2   57.5    Group2  100 0.075   Healthy 46
Gene2   62.5    Group1  100 0.074   Healthy 38
Gene2   62.5    Group2  100 0.078   Healthy 42
Gene2   67.5    Group1  100 0.080   Healthy 23
Gene2   67.5    Group2  100 0.075   Healthy 16
Gene1   52.5    Group1  100.01  0.090   Unhealthy   41
Gene1   52.5    Group2  100.01  0.060   Unhealthy   57
Gene1   57.5    Group1  100.01  0.093   Unhealthy   79
Gene1   57.5    Group2  100.01  0.053   Unhealthy   70
Gene1   62.5    Group1  100.01  0.089   Unhealthy   93
Gene1   62.5    Group2  100.01  0.057   Unhealthy   84
Gene1   67.5    Group1  100.01  0.089   Unhealthy   98
Gene1   67.5    Group2  100.01  0.065   Unhealthy   64
Gene2   52.5    Group1  100.01  0.074   Unhealthy   36
Gene2   52.5    Group2  100.01  0.074   Unhealthy   57
Gene2   57.5    Group1  100.01  0.077   Unhealthy   79
Gene2   57.5    Group2  100.01  0.078   Unhealthy   70
Gene2   62.5    Group1  100.01  0.073   Unhealthy   93
Gene2   62.5    Group2  100.01  0.073   Unhealthy   84
Gene2   67.5    Group1  100.01  0.072   Unhealthy   98
Gene2   67.5    Group2  100.01  0.074   Unhealthy   64

The dput for my data is:

dput(data)
structure(list(Biomarkers = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Gene1", 
"Gene2"), class = "factor"), TimePoint = c(52.5, 52.5, 57.5, 
57.5, 62.5, 62.5, 67.5, 67.5, 52.5, 52.5, 57.5, 57.5, 62.5, 62.5, 
67.5, 67.5, 52.5, 52.5, 57.5, 57.5, 62.5, 62.5, 67.5, 67.5, 52.5, 
52.5, 57.5, 57.5, 62.5, 62.5, 67.5, 67.5, 52.5, 52.5, 57.5, 57.5, 
62.5, 62.5, 67.5, 67.5, 52.5, 52.5, 57.5, 57.5, 62.5, 62.5, 67.5, 
67.5, 52.5, 52.5, 57.5, 57.5, 62.5, 62.5, 67.5, 67.5, 52.5, 52.5, 
57.5, 57.5, 62.5, 62.5, 67.5, 67.5, 52.5, 52.5, 57.5, 57.5, 62.5, 
62.5, 67.5, 67.5, 52.5, 52.5, 57.5, 57.5, 62.5, 62.5, 67.5, 67.5, 
52.5, 52.5, 57.5, 57.5, 62.5, 62.5, 67.5, 67.5, 52.5, 52.5, 57.5, 
57.5, 62.5, 62.5, 67.5, 67.5, 52.5, 52.5, 57.5, 57.5, 62.5, 62.5, 
67.5, 67.5, 52.5, 52.5, 57.5, 57.5, 62.5, 62.5, 67.5, 67.5, 52.5, 
52.5, 57.5, 57.5, 62.5, 62.5, 67.5, 67.5, 52.5, 52.5, 57.5, 57.5, 
62.5, 62.5, 67.5, 67.5), Group = structure(c(1L, 2L, 1L, 2L, 
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("Group1", 
"Group2"), class = "factor"), Scale = c(25, 25, 25, 25, 25, 25, 
25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25.01, 25.01, 25.01, 
25.01, 25.01, 25.01, 25.01, 25.01, 25.01, 25.01, 25.01, 25.01, 
25.01, 25.01, 25.01, 25.01, 50, 50, 50, 50, 50, 50, 50, 50, 50, 
50, 50, 50, 50, 50, 50, 50, 50.01, 50.01, 50.01, 50.01, 50.01, 
50.01, 50.01, 50.01, 50.01, 50.01, 50.01, 50.01, 50.01, 50.01, 
50.01, 50.01, 75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 
75, 75, 75, 75, 75.01, 75.01, 75.01, 75.01, 75.01, 75.01, 75.01, 
75.01, 75.01, 75.01, 75.01, 75.01, 75.01, 75.01, 75.01, 75.01, 
100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 
100, 100, 100, 100.01, 100.01, 100.01, 100.01, 100.01, 100.01, 
100.01, 100.01, 100.01, 100.01, 100.01, 100.01, 100.01, 100.01, 
100.01, 100.01), Readings = c(0.027, 0.024, 0.029, 0.023, 0.03, 
0.024, 0.033, 0.026, 0.051, 0.046, 0.052, 0.048, 0.049, 0.051, 
0.051, 0.052, 0.026, 0.023, 0.027, 0.024, 0.03, 0.025, 0.033, 
0.022, 0.043, 0.044, 0.043, 0.043, 0.043, 0.044, 0.044, 0.044, 
0.035, 0.029, 0.039, 0.031, 0.038, 0.03, 0.04, 0.035, 0.058, 
0.053, 0.059, 0.056, 0.057, 0.058, 0.061, 0.058, 0.038, 0.03, 
0.038, 0.031, 0.04, 0.032, 0.043, 0.033, 0.052, 0.051, 0.052, 
0.051, 0.052, 0.052, 0.053, 0.051, 0.045, 0.038, 0.048, 0.041, 
0.047, 0.04, 0.05, 0.043, 0.066, 0.064, 0.065, 0.064, 0.068, 
0.071, 0.07, 0.071, 0.057, 0.041, 0.056, 0.04, 0.057, 0.043, 
0.059, 0.046, 0.063, 0.06, 0.061, 0.062, 0.062, 0.062, 0.061, 
0.06, 0.056, 0.046, 0.063, 0.048, 0.06, 0.052, 0.064, 0.055, 
0.082, 0.074, 0.07, 0.075, 0.074, 0.078, 0.08, 0.075, 0.09, 0.06, 
0.093, 0.053, 0.089, 0.057, 0.089, 0.065, 0.074, 0.074, 0.077, 
0.078, 0.073, 0.073, 0.072, 0.074), Condition = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Healthy", 
"Unhealthy"), class = "factor"), samples = c(33L, 35L, 39L, 46L, 
38L, 42L, 23L, 16L, 33L, 35L, 39L, 46L, 38L, 42L, 23L, 16L, 41L, 
57L, 79L, 70L, 93L, 84L, 98L, 64L, 36L, 57L, 79L, 70L, 93L, 84L, 
98L, 64L, 33L, 35L, 39L, 46L, 38L, 42L, 23L, 16L, 33L, 35L, 39L, 
46L, 38L, 42L, 23L, 16L, 41L, 57L, 79L, 70L, 93L, 84L, 98L, 64L, 
36L, 57L, 79L, 70L, 93L, 84L, 98L, 64L, 33L, 35L, 39L, 46L, 38L, 
42L, 23L, 16L, 33L, 35L, 39L, 46L, 38L, 42L, 23L, 16L, 41L, 57L, 
79L, 70L, 93L, 84L, 98L, 64L, 36L, 57L, 79L, 70L, 93L, 84L, 98L, 
64L, 33L, 35L, 39L, 46L, 38L, 42L, 23L, 16L, 33L, 35L, 39L, 46L, 
38L, 42L, 23L, 16L, 41L, 57L, 79L, 70L, 93L, 84L, 98L, 64L, 36L, 
57L, 79L, 70L, 93L, 84L, 98L, 64L)), class = "data.frame", row.names = c(NA, 
-128L))

The code I have now is this:

# Load libraries
library(ggplot2)
library(magrittr)
library(dplyr)
library(gridExtra)
library(grid)

proc_plot <- function(sub) {

  data_Group1 <- sub[sub$Group == "Group1", ]
  data_Group2 <- sub[sub$Group == "Group2", ]

  min_rdg <- min(data_Group1$Readings, data_Group2$Readings)
  max_rdg <- max(data_Group1$Readings, data_Group2$Readings)

  # Group1
  graph_Group1 <- ggplot(data_Group1, aes(x = TimePoint, y = Readings, group = Scale)) +
    labs(title="Group1", x="Time point", y="Readings") +
    scale_x_continuous(breaks = c(52.5, 57.5, 62.5, 67.5),
                       labels = c("1", "2", "3", "4")) +
    geom_line(aes(color = Scale, linetype=Condition), na.rm = TRUE, size = 0.8) +
    geom_point(aes(color = Scale),size = 2.5, na.rm = TRUE) +
    scale_color_continuous(name = "Scale", breaks = c(25, 50, 75, 100)) +
    scale_y_continuous(limits = c(min_rdg, max_rdg)) +
    theme(legend.key.height = unit(2.3, "cm"))

  # Group2
  graph_Group2 <- ggplot(data_Group2, aes(x = TimePoint, y = Readings, group = Scale)) +
    labs(title="Group2", x="Time point", y="Readings") +
    scale_x_continuous(breaks = c(52.5, 57.5, 62.5, 67.5),
                       labels = c("1", "2", "3", "4")) +
    geom_line(aes(color = Scale, linetype=Condition), na.rm = TRUE, size = 0.8) +
    geom_point(aes(color = Scale), size = 2.5, na.rm = TRUE) +
    scale_color_continuous(name = "Scale", breaks = c(25, 50, 75, 100)) +
    scale_y_continuous(limits = c(min_rdg, max_rdg)) +
    theme(legend.key.height = unit(2.3, "cm"))

  png (paste0("ScatterPlot_", sub$Biomarkers[[1]], ".png"), height=600, width=1111)
    output <- grid.arrange(graph_Group1, graph_Group2, nrow = 1, 
                           top=textGrob(sub$Biomarkers[[1]], gp=gpar(fontsize=20)))
  dev.off()

  return(output)
}


# BUILD PLOT LIST AND PNG FILES
plot_list <- by(data, data$Biomarkers, proc_plot)

dev.off()
grid.draw(plot_list$Gene1)

dev.off()
grid.draw(plot_list$Gene2)

I also attach the example png file for Gene1 below. I have manually added the numbers in red to highlight and show that it is exactly what I need to have for each gene/png file (but in black). enter image description here

Any help appreciated. Thanking you.

Upvotes: 1

Views: 47

Answers (1)

Gregor Thomas
Gregor Thomas

Reputation: 145965

You can use \n for a line break in your labels. E.g.,

scale_x_continuous(breaks = c(52.5, 57.5, 62.5, 67.5),
                   labels = c("1\n(33, 41)", "2\n(39, 79)", "3\n(38, 93)", "4\n(23, 98)"))

You can do this programmatically like this:

lab_df = data_Group1 %>% group_by(TimePoint) %>% 
  summarize(label = sprintf("(%s, %s)", first(samples[Condition == "Healthy"]), first(samples[Condition == "Unhealthy"])))
lab_df                                                  
# # A tibble: 4 x 2
#   TimePoint label   
#       <dbl> <chr>   
# 1      52.5 (33, 41)
# 2      57.5 (39, 79)
# 3      62.5 (38, 93)
# 4      67.5 (23, 98)

ggplot(...) + ... +
  scale_x_continuous(
    breaks = lab_df$TimePoint,
    labels = paste(1:nrow(lab_df), lab_df$label, sep = "\n")
  )

Full service solution. Simplified to use a for loop instead of handling groups separately, labels handled programmatically.

proc_plot <- function(sub) {
  lab_df = sub %>% group_by(TimePoint, Group) %>% 
    summarize(label = sprintf(
      "(%s, %s)", 
      first(samples[Condition == "Healthy"]),
      first(samples[Condition == "Unhealthy"])
    )) %>%
    arrange(Group, TimePoint) # make sure things are in order

  min_rdg <- min(sub$Readings)
  max_rdg <- max(sub$Readings)

  graphs = list()

  for (i in unique(sub$Group)) {
    this_lab = lab_df[lab_df$Group == i, ]
    graphs[[i]] =  ggplot(sub[sub$Group == i, ], aes(x = TimePoint, y = Readings, group = Scale)) +
      labs(title = i, x = "Time point", y = "Readings") +
      scale_x_continuous(breaks = this_lab$TimePoint,
                         labels = paste(1:nrow(this_lab), this_lab$label, sep = "\n")) +
      geom_line(aes(color = Scale, linetype=Condition), na.rm = TRUE, size = 0.8) +
      geom_point(aes(color = Scale),size = 2.5, na.rm = TRUE) +
      scale_color_continuous(name = "Scale", breaks = c(25, 50, 75, 100)) +
      scale_y_continuous(limits = c(min_rdg, max_rdg)) +
      theme(legend.key.height = unit(2.3, "cm"))
  }

  png (paste0("ScatterPlot_", sub$Biomarkers[[1]], ".png"), height=600, width=1111)
    output <- grid.arrange(grobs = graphs, nrow = 1, 
                           top = textGrob(sub$Biomarkers[[1]], gp = gpar(fontsize = 20)))
  dev.off()
  return(output)
}

proc_plot(sub[sub$Biomarkers == "Gene1", ])

enter image description here

Upvotes: 1

Related Questions