Reputation: 1275
I am working on scatter plots for multiple genes using a loop. Multiple png file are produced for each gene. Each gene / png file contains two scatter plots: Group1 on left and Group2 on right. Each group contains both healthy and unhealthy samples. I have managed to successfully derive the code up till here.
However, what I need to do now is to add sample numbers in the x-axis per 'time point' for each healthy and unhealthy group. This is based on the 'samples' column. This should be present as "(number of samples in healthy condition, number of samples in unhealthy condition)" for each time point. Can anyone help me with achieving that?
My current example data frame 'data' for 2 genes is as follows:
Biomarkers TimePoint Group Scale Readings Condition samples
Gene1 52.5 Group1 25 0.027 Healthy 33
Gene1 52.5 Group2 25 0.024 Healthy 35
Gene1 57.5 Group1 25 0.029 Healthy 39
Gene1 57.5 Group2 25 0.023 Healthy 46
Gene1 62.5 Group1 25 0.030 Healthy 38
Gene1 62.5 Group2 25 0.024 Healthy 42
Gene1 67.5 Group1 25 0.033 Healthy 23
Gene1 67.5 Group2 25 0.026 Healthy 16
Gene2 52.5 Group1 25 0.051 Healthy 33
Gene2 52.5 Group2 25 0.046 Healthy 35
Gene2 57.5 Group1 25 0.052 Healthy 39
Gene2 57.5 Group2 25 0.048 Healthy 46
Gene2 62.5 Group1 25 0.049 Healthy 38
Gene2 62.5 Group2 25 0.051 Healthy 42
Gene2 67.5 Group1 25 0.051 Healthy 23
Gene2 67.5 Group2 25 0.052 Healthy 16
Gene1 52.5 Group1 25.01 0.026 Unhealthy 41
Gene1 52.5 Group2 25.01 0.023 Unhealthy 57
Gene1 57.5 Group1 25.01 0.027 Unhealthy 79
Gene1 57.5 Group2 25.01 0.024 Unhealthy 70
Gene1 62.5 Group1 25.01 0.030 Unhealthy 93
Gene1 62.5 Group2 25.01 0.025 Unhealthy 84
Gene1 67.5 Group1 25.01 0.033 Unhealthy 98
Gene1 67.5 Group2 25.01 0.022 Unhealthy 64
Gene2 52.5 Group1 25.01 0.043 Unhealthy 36
Gene2 52.5 Group2 25.01 0.044 Unhealthy 57
Gene2 57.5 Group1 25.01 0.043 Unhealthy 79
Gene2 57.5 Group2 25.01 0.043 Unhealthy 70
Gene2 62.5 Group1 25.01 0.043 Unhealthy 93
Gene2 62.5 Group2 25.01 0.044 Unhealthy 84
Gene2 67.5 Group1 25.01 0.044 Unhealthy 98
Gene2 67.5 Group2 25.01 0.044 Unhealthy 64
Gene1 52.5 Group1 50 0.035 Healthy 33
Gene1 52.5 Group2 50 0.029 Healthy 35
Gene1 57.5 Group1 50 0.039 Healthy 39
Gene1 57.5 Group2 50 0.031 Healthy 46
Gene1 62.5 Group1 50 0.038 Healthy 38
Gene1 62.5 Group2 50 0.030 Healthy 42
Gene1 67.5 Group1 50 0.040 Healthy 23
Gene1 67.5 Group2 50 0.035 Healthy 16
Gene2 52.5 Group1 50 0.058 Healthy 33
Gene2 52.5 Group2 50 0.053 Healthy 35
Gene2 57.5 Group1 50 0.059 Healthy 39
Gene2 57.5 Group2 50 0.056 Healthy 46
Gene2 62.5 Group1 50 0.057 Healthy 38
Gene2 62.5 Group2 50 0.058 Healthy 42
Gene2 67.5 Group1 50 0.061 Healthy 23
Gene2 67.5 Group2 50 0.058 Healthy 16
Gene1 52.5 Group1 50.01 0.038 Unhealthy 41
Gene1 52.5 Group2 50.01 0.030 Unhealthy 57
Gene1 57.5 Group1 50.01 0.038 Unhealthy 79
Gene1 57.5 Group2 50.01 0.031 Unhealthy 70
Gene1 62.5 Group1 50.01 0.040 Unhealthy 93
Gene1 62.5 Group2 50.01 0.032 Unhealthy 84
Gene1 67.5 Group1 50.01 0.043 Unhealthy 98
Gene1 67.5 Group2 50.01 0.033 Unhealthy 64
Gene2 52.5 Group1 50.01 0.052 Unhealthy 36
Gene2 52.5 Group2 50.01 0.051 Unhealthy 57
Gene2 57.5 Group1 50.01 0.052 Unhealthy 79
Gene2 57.5 Group2 50.01 0.051 Unhealthy 70
Gene2 62.5 Group1 50.01 0.052 Unhealthy 93
Gene2 62.5 Group2 50.01 0.052 Unhealthy 84
Gene2 67.5 Group1 50.01 0.053 Unhealthy 98
Gene2 67.5 Group2 50.01 0.051 Unhealthy 64
Gene1 52.5 Group1 75 0.045 Healthy 33
Gene1 52.5 Group2 75 0.038 Healthy 35
Gene1 57.5 Group1 75 0.048 Healthy 39
Gene1 57.5 Group2 75 0.041 Healthy 46
Gene1 62.5 Group1 75 0.047 Healthy 38
Gene1 62.5 Group2 75 0.040 Healthy 42
Gene1 67.5 Group1 75 0.050 Healthy 23
Gene1 67.5 Group2 75 0.043 Healthy 16
Gene2 52.5 Group1 75 0.066 Healthy 33
Gene2 52.5 Group2 75 0.064 Healthy 35
Gene2 57.5 Group1 75 0.065 Healthy 39
Gene2 57.5 Group2 75 0.064 Healthy 46
Gene2 62.5 Group1 75 0.068 Healthy 38
Gene2 62.5 Group2 75 0.071 Healthy 42
Gene2 67.5 Group1 75 0.070 Healthy 23
Gene2 67.5 Group2 75 0.071 Healthy 16
Gene1 52.5 Group1 75.01 0.057 Unhealthy 41
Gene1 52.5 Group2 75.01 0.041 Unhealthy 57
Gene1 57.5 Group1 75.01 0.056 Unhealthy 79
Gene1 57.5 Group2 75.01 0.040 Unhealthy 70
Gene1 62.5 Group1 75.01 0.057 Unhealthy 93
Gene1 62.5 Group2 75.01 0.043 Unhealthy 84
Gene1 67.5 Group1 75.01 0.059 Unhealthy 98
Gene1 67.5 Group2 75.01 0.046 Unhealthy 64
Gene2 52.5 Group1 75.01 0.063 Unhealthy 36
Gene2 52.5 Group2 75.01 0.060 Unhealthy 57
Gene2 57.5 Group1 75.01 0.061 Unhealthy 79
Gene2 57.5 Group2 75.01 0.062 Unhealthy 70
Gene2 62.5 Group1 75.01 0.062 Unhealthy 93
Gene2 62.5 Group2 75.01 0.062 Unhealthy 84
Gene2 67.5 Group1 75.01 0.061 Unhealthy 98
Gene2 67.5 Group2 75.01 0.060 Unhealthy 64
Gene1 52.5 Group1 100 0.056 Healthy 33
Gene1 52.5 Group2 100 0.046 Healthy 35
Gene1 57.5 Group1 100 0.063 Healthy 39
Gene1 57.5 Group2 100 0.048 Healthy 46
Gene1 62.5 Group1 100 0.060 Healthy 38
Gene1 62.5 Group2 100 0.052 Healthy 42
Gene1 67.5 Group1 100 0.064 Healthy 23
Gene1 67.5 Group2 100 0.055 Healthy 16
Gene2 52.5 Group1 100 0.082 Healthy 33
Gene2 52.5 Group2 100 0.074 Healthy 35
Gene2 57.5 Group1 100 0.070 Healthy 39
Gene2 57.5 Group2 100 0.075 Healthy 46
Gene2 62.5 Group1 100 0.074 Healthy 38
Gene2 62.5 Group2 100 0.078 Healthy 42
Gene2 67.5 Group1 100 0.080 Healthy 23
Gene2 67.5 Group2 100 0.075 Healthy 16
Gene1 52.5 Group1 100.01 0.090 Unhealthy 41
Gene1 52.5 Group2 100.01 0.060 Unhealthy 57
Gene1 57.5 Group1 100.01 0.093 Unhealthy 79
Gene1 57.5 Group2 100.01 0.053 Unhealthy 70
Gene1 62.5 Group1 100.01 0.089 Unhealthy 93
Gene1 62.5 Group2 100.01 0.057 Unhealthy 84
Gene1 67.5 Group1 100.01 0.089 Unhealthy 98
Gene1 67.5 Group2 100.01 0.065 Unhealthy 64
Gene2 52.5 Group1 100.01 0.074 Unhealthy 36
Gene2 52.5 Group2 100.01 0.074 Unhealthy 57
Gene2 57.5 Group1 100.01 0.077 Unhealthy 79
Gene2 57.5 Group2 100.01 0.078 Unhealthy 70
Gene2 62.5 Group1 100.01 0.073 Unhealthy 93
Gene2 62.5 Group2 100.01 0.073 Unhealthy 84
Gene2 67.5 Group1 100.01 0.072 Unhealthy 98
Gene2 67.5 Group2 100.01 0.074 Unhealthy 64
The dput for my data is:
dput(data)
structure(list(Biomarkers = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Gene1",
"Gene2"), class = "factor"), TimePoint = c(52.5, 52.5, 57.5,
57.5, 62.5, 62.5, 67.5, 67.5, 52.5, 52.5, 57.5, 57.5, 62.5, 62.5,
67.5, 67.5, 52.5, 52.5, 57.5, 57.5, 62.5, 62.5, 67.5, 67.5, 52.5,
52.5, 57.5, 57.5, 62.5, 62.5, 67.5, 67.5, 52.5, 52.5, 57.5, 57.5,
62.5, 62.5, 67.5, 67.5, 52.5, 52.5, 57.5, 57.5, 62.5, 62.5, 67.5,
67.5, 52.5, 52.5, 57.5, 57.5, 62.5, 62.5, 67.5, 67.5, 52.5, 52.5,
57.5, 57.5, 62.5, 62.5, 67.5, 67.5, 52.5, 52.5, 57.5, 57.5, 62.5,
62.5, 67.5, 67.5, 52.5, 52.5, 57.5, 57.5, 62.5, 62.5, 67.5, 67.5,
52.5, 52.5, 57.5, 57.5, 62.5, 62.5, 67.5, 67.5, 52.5, 52.5, 57.5,
57.5, 62.5, 62.5, 67.5, 67.5, 52.5, 52.5, 57.5, 57.5, 62.5, 62.5,
67.5, 67.5, 52.5, 52.5, 57.5, 57.5, 62.5, 62.5, 67.5, 67.5, 52.5,
52.5, 57.5, 57.5, 62.5, 62.5, 67.5, 67.5, 52.5, 52.5, 57.5, 57.5,
62.5, 62.5, 67.5, 67.5), Group = structure(c(1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("Group1",
"Group2"), class = "factor"), Scale = c(25, 25, 25, 25, 25, 25,
25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25.01, 25.01, 25.01,
25.01, 25.01, 25.01, 25.01, 25.01, 25.01, 25.01, 25.01, 25.01,
25.01, 25.01, 25.01, 25.01, 50, 50, 50, 50, 50, 50, 50, 50, 50,
50, 50, 50, 50, 50, 50, 50, 50.01, 50.01, 50.01, 50.01, 50.01,
50.01, 50.01, 50.01, 50.01, 50.01, 50.01, 50.01, 50.01, 50.01,
50.01, 50.01, 75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 75,
75, 75, 75, 75, 75.01, 75.01, 75.01, 75.01, 75.01, 75.01, 75.01,
75.01, 75.01, 75.01, 75.01, 75.01, 75.01, 75.01, 75.01, 75.01,
100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100.01, 100.01, 100.01, 100.01, 100.01, 100.01,
100.01, 100.01, 100.01, 100.01, 100.01, 100.01, 100.01, 100.01,
100.01, 100.01), Readings = c(0.027, 0.024, 0.029, 0.023, 0.03,
0.024, 0.033, 0.026, 0.051, 0.046, 0.052, 0.048, 0.049, 0.051,
0.051, 0.052, 0.026, 0.023, 0.027, 0.024, 0.03, 0.025, 0.033,
0.022, 0.043, 0.044, 0.043, 0.043, 0.043, 0.044, 0.044, 0.044,
0.035, 0.029, 0.039, 0.031, 0.038, 0.03, 0.04, 0.035, 0.058,
0.053, 0.059, 0.056, 0.057, 0.058, 0.061, 0.058, 0.038, 0.03,
0.038, 0.031, 0.04, 0.032, 0.043, 0.033, 0.052, 0.051, 0.052,
0.051, 0.052, 0.052, 0.053, 0.051, 0.045, 0.038, 0.048, 0.041,
0.047, 0.04, 0.05, 0.043, 0.066, 0.064, 0.065, 0.064, 0.068,
0.071, 0.07, 0.071, 0.057, 0.041, 0.056, 0.04, 0.057, 0.043,
0.059, 0.046, 0.063, 0.06, 0.061, 0.062, 0.062, 0.062, 0.061,
0.06, 0.056, 0.046, 0.063, 0.048, 0.06, 0.052, 0.064, 0.055,
0.082, 0.074, 0.07, 0.075, 0.074, 0.078, 0.08, 0.075, 0.09, 0.06,
0.093, 0.053, 0.089, 0.057, 0.089, 0.065, 0.074, 0.074, 0.077,
0.078, 0.073, 0.073, 0.072, 0.074), Condition = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Healthy",
"Unhealthy"), class = "factor"), samples = c(33L, 35L, 39L, 46L,
38L, 42L, 23L, 16L, 33L, 35L, 39L, 46L, 38L, 42L, 23L, 16L, 41L,
57L, 79L, 70L, 93L, 84L, 98L, 64L, 36L, 57L, 79L, 70L, 93L, 84L,
98L, 64L, 33L, 35L, 39L, 46L, 38L, 42L, 23L, 16L, 33L, 35L, 39L,
46L, 38L, 42L, 23L, 16L, 41L, 57L, 79L, 70L, 93L, 84L, 98L, 64L,
36L, 57L, 79L, 70L, 93L, 84L, 98L, 64L, 33L, 35L, 39L, 46L, 38L,
42L, 23L, 16L, 33L, 35L, 39L, 46L, 38L, 42L, 23L, 16L, 41L, 57L,
79L, 70L, 93L, 84L, 98L, 64L, 36L, 57L, 79L, 70L, 93L, 84L, 98L,
64L, 33L, 35L, 39L, 46L, 38L, 42L, 23L, 16L, 33L, 35L, 39L, 46L,
38L, 42L, 23L, 16L, 41L, 57L, 79L, 70L, 93L, 84L, 98L, 64L, 36L,
57L, 79L, 70L, 93L, 84L, 98L, 64L)), class = "data.frame", row.names = c(NA,
-128L))
The code I have now is this:
# Load libraries
library(ggplot2)
library(magrittr)
library(dplyr)
library(gridExtra)
library(grid)
proc_plot <- function(sub) {
data_Group1 <- sub[sub$Group == "Group1", ]
data_Group2 <- sub[sub$Group == "Group2", ]
min_rdg <- min(data_Group1$Readings, data_Group2$Readings)
max_rdg <- max(data_Group1$Readings, data_Group2$Readings)
# Group1
graph_Group1 <- ggplot(data_Group1, aes(x = TimePoint, y = Readings, group = Scale)) +
labs(title="Group1", x="Time point", y="Readings") +
scale_x_continuous(breaks = c(52.5, 57.5, 62.5, 67.5),
labels = c("1", "2", "3", "4")) +
geom_line(aes(color = Scale, linetype=Condition), na.rm = TRUE, size = 0.8) +
geom_point(aes(color = Scale),size = 2.5, na.rm = TRUE) +
scale_color_continuous(name = "Scale", breaks = c(25, 50, 75, 100)) +
scale_y_continuous(limits = c(min_rdg, max_rdg)) +
theme(legend.key.height = unit(2.3, "cm"))
# Group2
graph_Group2 <- ggplot(data_Group2, aes(x = TimePoint, y = Readings, group = Scale)) +
labs(title="Group2", x="Time point", y="Readings") +
scale_x_continuous(breaks = c(52.5, 57.5, 62.5, 67.5),
labels = c("1", "2", "3", "4")) +
geom_line(aes(color = Scale, linetype=Condition), na.rm = TRUE, size = 0.8) +
geom_point(aes(color = Scale), size = 2.5, na.rm = TRUE) +
scale_color_continuous(name = "Scale", breaks = c(25, 50, 75, 100)) +
scale_y_continuous(limits = c(min_rdg, max_rdg)) +
theme(legend.key.height = unit(2.3, "cm"))
png (paste0("ScatterPlot_", sub$Biomarkers[[1]], ".png"), height=600, width=1111)
output <- grid.arrange(graph_Group1, graph_Group2, nrow = 1,
top=textGrob(sub$Biomarkers[[1]], gp=gpar(fontsize=20)))
dev.off()
return(output)
}
# BUILD PLOT LIST AND PNG FILES
plot_list <- by(data, data$Biomarkers, proc_plot)
dev.off()
grid.draw(plot_list$Gene1)
dev.off()
grid.draw(plot_list$Gene2)
I also attach the example png file for Gene1 below. I have manually added the numbers in red to highlight and show that it is exactly what I need to have for each gene/png file (but in black).
Any help appreciated. Thanking you.
Upvotes: 1
Views: 47
Reputation: 145965
You can use \n
for a line break in your labels. E.g.,
scale_x_continuous(breaks = c(52.5, 57.5, 62.5, 67.5),
labels = c("1\n(33, 41)", "2\n(39, 79)", "3\n(38, 93)", "4\n(23, 98)"))
You can do this programmatically like this:
lab_df = data_Group1 %>% group_by(TimePoint) %>%
summarize(label = sprintf("(%s, %s)", first(samples[Condition == "Healthy"]), first(samples[Condition == "Unhealthy"])))
lab_df
# # A tibble: 4 x 2
# TimePoint label
# <dbl> <chr>
# 1 52.5 (33, 41)
# 2 57.5 (39, 79)
# 3 62.5 (38, 93)
# 4 67.5 (23, 98)
ggplot(...) + ... +
scale_x_continuous(
breaks = lab_df$TimePoint,
labels = paste(1:nrow(lab_df), lab_df$label, sep = "\n")
)
Full service solution. Simplified to use a for
loop instead of handling groups separately, labels handled programmatically.
proc_plot <- function(sub) {
lab_df = sub %>% group_by(TimePoint, Group) %>%
summarize(label = sprintf(
"(%s, %s)",
first(samples[Condition == "Healthy"]),
first(samples[Condition == "Unhealthy"])
)) %>%
arrange(Group, TimePoint) # make sure things are in order
min_rdg <- min(sub$Readings)
max_rdg <- max(sub$Readings)
graphs = list()
for (i in unique(sub$Group)) {
this_lab = lab_df[lab_df$Group == i, ]
graphs[[i]] = ggplot(sub[sub$Group == i, ], aes(x = TimePoint, y = Readings, group = Scale)) +
labs(title = i, x = "Time point", y = "Readings") +
scale_x_continuous(breaks = this_lab$TimePoint,
labels = paste(1:nrow(this_lab), this_lab$label, sep = "\n")) +
geom_line(aes(color = Scale, linetype=Condition), na.rm = TRUE, size = 0.8) +
geom_point(aes(color = Scale),size = 2.5, na.rm = TRUE) +
scale_color_continuous(name = "Scale", breaks = c(25, 50, 75, 100)) +
scale_y_continuous(limits = c(min_rdg, max_rdg)) +
theme(legend.key.height = unit(2.3, "cm"))
}
png (paste0("ScatterPlot_", sub$Biomarkers[[1]], ".png"), height=600, width=1111)
output <- grid.arrange(grobs = graphs, nrow = 1,
top = textGrob(sub$Biomarkers[[1]], gp = gpar(fontsize = 20)))
dev.off()
return(output)
}
proc_plot(sub[sub$Biomarkers == "Gene1", ])
Upvotes: 1