Reputation: 41
I am not expert in R and trying my best. I appreciate to have some assistance.
I have data as follows:
POPs: num[1:3000] 3,4,5,6,7,....
PM1: num[1:3000] 3,4,5,6,7,....
PM2: num[1:3000] 3,4,5,6,7,....
PM3: num[1:3000] 3,4,5,6,7,....
PM4: num[1:3000] 3,4,5,6,7,....
.. etc
I want to do regression analysis for each PMs (PM1, PM2, PM3, ..) and put them into one figure (as in the picture) . Also, adding into them the R2 , RMSE, MAE and the regression abline and 1:1 line.
The x is POPs and the y is PM1 and PM2 and PM3 ... etc.
I can do for each PMs (y-axis) individually in the code (aes(x=POPs, y=PM1)). However, it takes lot of figures and better to combine them in one figure. How I can add all the PMs into a single (y) in the code. I think some advance in looping which I am not into this level unfortunately.
ggplot(data =Plot,aes(x=POPs, y=PM1)) +
stat_smooth(method = "lm", se=FALSE, color="black", formula = y ~ x) +
geom_point(size=0.3) +
stat_cor(aes(label = paste(..rr.label..)), # adds R^2 value
r.accuracy = 0.01,
label.x = 0, label.y = 375, size = 4) +
stat_regline_equation(aes(label = ..eq.label..), # adds equation to linear regression
label.x = 0, label.y = 400, size = 4)
Based on Behnam Hedayat Answer below with some coding modification from my side and from Allan Cameron .. I can say now it worked 100% perfectly
# change format of df to longer
Plot %>% pivot_longer(cols=starts_with("PEM"), names_to = "PEMs", values_to = "PEMs_value") -> df2
df2 %>% ggplot(aes(POPs, PEMs_value)) +
geom_point(color = "#fe4300", size=0.3) +
geom_abline()+
geom_smooth(method='lm', se=FALSE, formula = y ~ x, color = "#1b14fd")+
labs(y = expression(bold(PLF~PM["2.5"]~("u"*g/m^"3"))), x = expression(bold(POPS~PM["2.5"]~("u"*g/m^"3")))) +
stat_cor(aes(label = paste(..rr.label..)), # adds R^2 value
r.accuracy = 0.01,
label.x = 0, label.y = 110, size = 3) +
stat_regline_equation(aes(label = ..eq.label..), # adds equation to linear regression
label.x = 0, label.y = 100, size = 3) +
facet_wrap(~PEMs, ncol=5)
Upvotes: 0
Views: 521
Reputation: 857
You can use facet_wrap
function of ggplot2
, but first you have to reshape your dataset to longer format by pivot_longer()
function of tidyverse
.
To add regression metrics on plots, you can create a separate data frame containing metrics of each group of PMs variable, then use this data frame in geom_text
function with x
and y
column created for x and y position respectively.
Here I also used caret
package functions (R2
, RMSE
, MAE
) to calculate regression metrics.
# caret for calculating R2, MAE and RMSE
# tidyverse to reshape data to longer format
libs <- c("ggplot2", "tidyverse","caret")
suppressMessages(invisible(sapply(libs, library, character.only=T)))
# sample dataset
df <- data.frame(POPs = sample(1:100, 100),
PM1 = sample(1:100, 100),
PM2 = sample(1:100, 100),
PM3 = sample(1:100, 100),
PM4 = sample(1:100,100),
PM5 = sample(1:100,100),
PM6 = sample(1:100,100),
PM7 = sample(1:100,100),
PM8 = sample(1:100,100))
# change format of df to longer
df %>% pivot_longer(cols=starts_with("PM"),
names_to = "PMs", values_to = "PMs_value") -> df2
head(df2, 10)
#> # A tibble: 10 × 3
#> POPs PMs PMs_value
#> <int> <chr> <int>
#> 1 5 PM1 88
#> 2 5 PM2 21
#> 3 5 PM3 51
#> 4 5 PM4 40
#> 5 5 PM5 40
#> 6 5 PM6 2
#> 7 5 PM7 30
#> 8 5 PM8 70
#> 9 52 PM1 13
#> 10 52 PM2 90
# create a dataframe of summary of regression metrics
summary_df <- df2 %>%
group_by(PMs) %>%
summarise(R2 = R2(PMs_value, POPs),
RMSE=RMSE(PMs_value, POPs),
MAE=MAE(PMs_value, POPs)) %>%
mutate_if(is.numeric, round,digits=2) %>%
pivot_longer(cols = -PMs, names_to = "Metric", values_to = "Metric_value") %>%
# add x column for x position of text and y column for y position
mutate(x = rep(30, times =nrow(.)),
y = rep(c(90,80,70), times=nrow(.)/3)) %>%
unite("Metric", Metric:Metric_value, sep = " = ")
summary_df
#> # A tibble: 24 × 4
#> PMs Metric x y
#> <chr> <chr> <dbl> <dbl>
#> 1 PM1 R2 = 0.03 30 90
#> 2 PM1 RMSE = 43.95 30 80
#> 3 PM1 MAE = 36.72 30 70
#> 4 PM2 R2 = 0.02 30 90
#> 5 PM2 RMSE = 37.83 30 80
#> 6 PM2 MAE = 29.76 30 70
#> 7 PM3 R2 = 0.02 30 90
#> 8 PM3 RMSE = 43.69 30 80
#> 9 PM3 MAE = 36.88 30 70
#> 10 PM4 R2 = 0.01 30 90
#> # … with 14 more rows
df2 %>% ggplot(aes(POPs, PMs_value)) +
geom_point(size=0.3) +geom_abline()+
geom_smooth(method='lm', se=FALSE)+
facet_wrap(~PMs, ncol=4)+
geom_text(data = summary_df,
mapping = aes(x = x, y = y, label = Metric))
#> `geom_smooth()` using formula = 'y ~ x'
Created on 2023-02-12 with reprex v2.0.2
Upvotes: 1
Reputation: 174641
You first need to get your data into the correct format - that is, to pivot it into long format, such that the PM column names are in a single column, and the values are in their own column too. Then you can use the names column as a faceting variable in ggplot:
library(tidyverse)
Plot %>%
pivot_longer(-POPs) %>%
ggplot(aes(POPs, value)) +
geom_abline() +
geom_point(color = "#fe4300", alpha = 0.5) +
geom_smooth(method = "lm", se = FALSE, formula = y ~ x, color = "#fd1b14") +
coord_cartesian(xlim = c(0, 100), ylim = c(0, 100)) +
facet_wrap(.~name, nrow = 5, scales = "free") +
theme_classic() +
theme(strip.background = element_blank(),
panel.border = element_rect(fill = NA))
Data used
Obviously we don't have your data (unless we were to transcribe the picture of your data or include the output of dput(Plot)
in your question, so I have constructed a dummy data set with the same names and structure as your own:
set.seed(1)
Plot <- setNames(as.data.frame(cbind(1:115,
replicate(17, sample(100, 115, TRUE)))),
c("POPs", paste0("PM", 1:17)))
str(Plot)
#> 'data.frame': 115 obs. of 18 variables:
#> $ POPs: int 1 2 3 4 5 6 7 8 9 10 ...
#> $ PM1 : int 68 39 1 34 87 43 14 82 59 51 ...
#> $ PM2 : int 1 29 78 22 70 28 37 61 46 67 ...
#> $ PM3 : int 99 77 57 71 25 31 37 92 28 62 ...
#> $ PM4 : int 60 65 64 53 5 44 35 23 29 35 ...
#> $ PM5 : int 48 7 27 43 9 8 86 45 6 27 ...
#> $ PM6 : int 65 2 9 49 69 91 93 66 31 78 ...
#> $ PM7 : int 50 89 8 54 31 69 12 30 9 66 ...
#> $ PM8 : int 21 7 99 42 33 94 5 5 4 11 ...
#> $ PM9 : int 22 56 58 55 99 96 5 52 47 55 ...
#> $ PM10: int 84 84 55 98 73 47 13 5 63 3 ...
#> $ PM11: int 41 83 91 7 78 32 49 14 92 84 ...
#> $ PM12: int 16 39 37 15 24 97 56 62 69 100 ...
#> $ PM13: int 94 69 53 37 70 57 50 51 18 29 ...
#> $ PM14: int 79 40 11 67 25 54 21 34 59 46 ...
#> $ PM15: int 5 89 74 34 47 85 29 24 46 98 ...
#> $ PM16: int 44 22 57 63 7 95 46 66 4 92 ...
#> $ PM17: int 38 57 48 75 8 28 21 2 84 95 ...
Created on 2023-02-11 with reprex v2.0.2
Upvotes: 1