Hala
Hala

Reputation: 41

Plotting multiple graphs of regression into one figure

I am not expert in R and trying my best. I appreciate to have some assistance.

I have data as follows:

    POPs: num[1:3000] 3,4,5,6,7,....
    PM1:  num[1:3000] 3,4,5,6,7,....
    PM2:  num[1:3000] 3,4,5,6,7,....
    PM3:  num[1:3000] 3,4,5,6,7,....
    PM4:  num[1:3000] 3,4,5,6,7,....
.. etc

enter image description here

I want to do regression analysis for each PMs (PM1, PM2, PM3, ..) and put them into one figure (as in the picture)enter image description here . Also, adding into them the R2 , RMSE, MAE and the regression abline and 1:1 line.

The x is POPs and the y is PM1 and PM2 and PM3 ... etc.

I can do for each PMs (y-axis) individually in the code (aes(x=POPs, y=PM1)). However, it takes lot of figures and better to combine them in one figure. How I can add all the PMs into a single (y) in the code. I think some advance in looping which I am not into this level unfortunately.

ggplot(data =Plot,aes(x=POPs, y=PM1)) +
  stat_smooth(method = "lm", se=FALSE, color="black", formula = y ~ x) +
  geom_point(size=0.3) +
  stat_cor(aes(label = paste(..rr.label..)), # adds R^2 value
           r.accuracy = 0.01,
           label.x = 0, label.y = 375, size = 4) +
  stat_regline_equation(aes(label = ..eq.label..), # adds equation to linear regression
                        label.x = 0, label.y = 400, size = 4)

Based on Behnam Hedayat Answer below with some coding modification from my side and from Allan Cameron .. I can say now it worked 100% perfectly

# change format of df to longer
Plot %>%  pivot_longer(cols=starts_with("PEM"), names_to = "PEMs", values_to = "PEMs_value") -> df2

df2 %>% ggplot(aes(POPs, PEMs_value)) +
  geom_point(color = "#fe4300", size=0.3) +
  geom_abline()+
  geom_smooth(method='lm', se=FALSE, formula = y ~ x, color = "#1b14fd")+
  labs(y = expression(bold(PLF~PM["2.5"]~("u"*g/m^"3"))), x = expression(bold(POPS~PM["2.5"]~("u"*g/m^"3")))) +
  stat_cor(aes(label = paste(..rr.label..)), # adds R^2 value
           r.accuracy = 0.01,
           label.x = 0, label.y = 110, size = 3) +
  stat_regline_equation(aes(label = ..eq.label..), # adds equation to linear regression
                        label.x = 0, label.y = 100, size = 3) +
  facet_wrap(~PEMs, ncol=5)
  

enter image description here

Upvotes: 0

Views: 521

Answers (2)

Behnam Hedayat
Behnam Hedayat

Reputation: 857

You can use facet_wrap function of ggplot2, but first you have to reshape your dataset to longer format by pivot_longer() function of tidyverse.
To add regression metrics on plots, you can create a separate data frame containing metrics of each group of PMs variable, then use this data frame in geom_text function with x and y column created for x and y position respectively.

Here I also used caret package functions (R2, RMSE, MAE) to calculate regression metrics.

# caret for calculating R2, MAE and RMSE
# tidyverse to reshape data to longer format
libs <- c("ggplot2", "tidyverse","caret")
suppressMessages(invisible(sapply(libs, library, character.only=T)))

# sample dataset
df <- data.frame(POPs = sample(1:100, 100),
                 PM1 = sample(1:100, 100),
                 PM2 = sample(1:100, 100),
                 PM3 = sample(1:100, 100),
                 PM4 = sample(1:100,100),
                 PM5 = sample(1:100,100),
                 PM6 = sample(1:100,100),
                 PM7 = sample(1:100,100),
                 PM8 = sample(1:100,100))


# change format of df to longer
df %>%  pivot_longer(cols=starts_with("PM"),
                     names_to = "PMs", values_to = "PMs_value") -> df2

head(df2, 10)
#> # A tibble: 10 × 3
#>     POPs PMs   PMs_value
#>    <int> <chr>     <int>
#>  1     5 PM1          88
#>  2     5 PM2          21
#>  3     5 PM3          51
#>  4     5 PM4          40
#>  5     5 PM5          40
#>  6     5 PM6           2
#>  7     5 PM7          30
#>  8     5 PM8          70
#>  9    52 PM1          13
#> 10    52 PM2          90

# create  a dataframe of summary of regression metrics
summary_df <- df2 %>%
  group_by(PMs) %>%
  summarise(R2 = R2(PMs_value, POPs),
            RMSE=RMSE(PMs_value, POPs),
            MAE=MAE(PMs_value, POPs)) %>%
  mutate_if(is.numeric, round,digits=2) %>%
  pivot_longer(cols = -PMs, names_to = "Metric", values_to = "Metric_value") %>%
  # add x column for x position of text and y column for y position
  mutate(x = rep(30, times =nrow(.)),
         y = rep(c(90,80,70), times=nrow(.)/3)) %>%
  unite("Metric", Metric:Metric_value, sep = " = ")

summary_df
#> # A tibble: 24 × 4
#>    PMs   Metric           x     y
#>    <chr> <chr>        <dbl> <dbl>
#>  1 PM1   R2 = 0.03       30    90
#>  2 PM1   RMSE = 43.95    30    80
#>  3 PM1   MAE = 36.72     30    70
#>  4 PM2   R2 = 0.02       30    90
#>  5 PM2   RMSE = 37.83    30    80
#>  6 PM2   MAE = 29.76     30    70
#>  7 PM3   R2 = 0.02       30    90
#>  8 PM3   RMSE = 43.69    30    80
#>  9 PM3   MAE = 36.88     30    70
#> 10 PM4   R2 = 0.01       30    90
#> # … with 14 more rows

df2 %>% ggplot(aes(POPs, PMs_value)) +
  geom_point(size=0.3) +geom_abline()+
  geom_smooth(method='lm', se=FALSE)+
  facet_wrap(~PMs, ncol=4)+
  geom_text(data = summary_df,
            mapping = aes(x = x, y = y, label = Metric))
#> `geom_smooth()` using formula = 'y ~ x'

Created on 2023-02-12 with reprex v2.0.2

Upvotes: 1

Allan Cameron
Allan Cameron

Reputation: 174641

You first need to get your data into the correct format - that is, to pivot it into long format, such that the PM column names are in a single column, and the values are in their own column too. Then you can use the names column as a faceting variable in ggplot:

library(tidyverse)

Plot %>%
  pivot_longer(-POPs) %>%
  ggplot(aes(POPs, value)) +
  geom_abline() +
  geom_point(color = "#fe4300", alpha = 0.5) +
  geom_smooth(method = "lm", se = FALSE, formula = y ~ x, color = "#fd1b14") +
  coord_cartesian(xlim = c(0, 100), ylim = c(0, 100)) +
  facet_wrap(.~name, nrow = 5, scales = "free") +
  theme_classic() +
  theme(strip.background = element_blank(),
        panel.border = element_rect(fill = NA))

enter image description here


Data used

Obviously we don't have your data (unless we were to transcribe the picture of your data or include the output of dput(Plot) in your question, so I have constructed a dummy data set with the same names and structure as your own:

set.seed(1)

Plot <- setNames(as.data.frame(cbind(1:115, 
                 replicate(17, sample(100, 115, TRUE)))),
                 c("POPs", paste0("PM", 1:17)))

str(Plot)
#> 'data.frame':    115 obs. of  18 variables:
#>  $ POPs: int  1 2 3 4 5 6 7 8 9 10 ...
#>  $ PM1 : int  68 39 1 34 87 43 14 82 59 51 ...
#>  $ PM2 : int  1 29 78 22 70 28 37 61 46 67 ...
#>  $ PM3 : int  99 77 57 71 25 31 37 92 28 62 ...
#>  $ PM4 : int  60 65 64 53 5 44 35 23 29 35 ...
#>  $ PM5 : int  48 7 27 43 9 8 86 45 6 27 ...
#>  $ PM6 : int  65 2 9 49 69 91 93 66 31 78 ...
#>  $ PM7 : int  50 89 8 54 31 69 12 30 9 66 ...
#>  $ PM8 : int  21 7 99 42 33 94 5 5 4 11 ...
#>  $ PM9 : int  22 56 58 55 99 96 5 52 47 55 ...
#>  $ PM10: int  84 84 55 98 73 47 13 5 63 3 ...
#>  $ PM11: int  41 83 91 7 78 32 49 14 92 84 ...
#>  $ PM12: int  16 39 37 15 24 97 56 62 69 100 ...
#>  $ PM13: int  94 69 53 37 70 57 50 51 18 29 ...
#>  $ PM14: int  79 40 11 67 25 54 21 34 59 46 ...
#>  $ PM15: int  5 89 74 34 47 85 29 24 46 98 ...
#>  $ PM16: int  44 22 57 63 7 95 46 66 4 92 ...
#>  $ PM17: int  38 57 48 75 8 28 21 2 84 95 ...

Created on 2023-02-11 with reprex v2.0.2

Upvotes: 1

Related Questions