Convert cox regression table to forest plot

Question

I want to convert a cox table to forest plot as showed below. Unforunatly I’ve lost my original data (coxph object) so I have to use the data from the table. Data below are just examples:

Desired output:

Reprex for the two tables:

GRP1<-tibble::tribble(
    ~Variable,   ~Level,        ~Number,         ~`HR.(univariable)`,       ~`HR.(multivariable)`,
        "Sex", "Female", "2204 (100.0)",                          NA,                          NA,
           NA,   "Male", "2318 (100.0)", "1.13 (0.91-1.40, p=0.265)", "1.13 (0.91-1.40, p=0.276)",
      "Score",      "1", "2401 (100.0)",                          NA,                          NA,
           NA,    "1-2", "1637 (100.0)", "1.49 (1.19-1.87, p=0.001)", "1.15 (0.90-1.47, p=0.250)",
           NA,    "3-4",  "412 (100.0)", "1.71 (1.14-2.56, p=0.010)", "1.09 (0.71-1.67, p=0.710)",
           NA,    ">=5",   "42 (100.0)", "1.67 (0.53-5.21, p=0.381)", "0.96 (0.30-3.05, p=0.943)",
  "Treatment",      "A", "1572 (100.0)",                          NA,                          NA,
           NA,      "B", "2951 (100.0)", "1.74 (1.26-2.40, p=0.001)", "1.53 (1.09-2.13, p=0.013)"
  )


GRP2<-tibble::tribble(
    ~Variable,   ~Level,        ~Number,          ~`HR.(univariable)`,          ~`HR.(univariable)`,
        "Sex", "Female", "2204 (100.0)",                           NA,                           NA,
           NA,   "Male", "2318 (100.0)",  "1.70 (1.36-2.13, p<0.001)",  "1.62 (1.28-2.04, p<0.001)",
      "Score",      "1", "2401 (100.0)",                           NA,                           NA,
           NA,    "1-2", "1637 (100.0)",  "2.76 (1.21-6.29, p=0.016)",  "2.69 (1.18-6.13, p=0.019)",
           NA,    "3-4",  "412 (100.0)", "5.11 (2.26-11.58, p<0.001)", "4.46 (1.95-10.23, p<0.001)",
           NA,    ">=5",   "42 (100.0)", "5.05 (2.19-11.64, p<0.001)",  "4.08 (1.73-9.59, p=0.001)",
  "Treatment",      "A", "1572 (100.0)",                           NA,                           NA,
           NA,      "B", "2951 (100.0)",  "1.48 (1.16-1.88, p=0.001)",  "1.23 (0.95-1.59, p=0.114)"
  )

Is it doable?

Best regards, H

Allan Cameron · Accepted Answer

The difficult thing about this task is not making the plot; it is converting your data from a bunch of text strings into a single long-format data frame that can be used for plotting. This involves using regular expressions to capture the appropriate number for each column, pivoting the result, then repeating that process for the second data frame before binding the two frames together. This is unavoidably ugly and complicated, but that is one of the reasons why having data stored in the correct format is so important.

Anyway, the following code performs the necessary operations:

library(dplyr)

wrangler <- function(data){
   grp <- as.character(match.call()$data)
   data %>%
   tidyr::fill(Variable) %>% 
   mutate(Variable = paste(Variable, Level), 
          Number = as.numeric(gsub("^(\d+).*$", "\1", Number)),
          univariable_HR =  as.numeric(gsub("^((\d+|\.)+).*$", "\1", `HR.(univariable)`)),
          univariable_lower = as.numeric(gsub("^.+? $(.+?)-.*$", "\1", `HR.(univariable)`)),
          univariable_upper = as.numeric(gsub("^.+?-(.+?),.*$", "\1", `HR.(univariable)`)),
          univariable_p = gsub("^.+?p=*(.+?)$.*$", "\1", `HR.(univariable)`),
          multivariable_HR =  as.numeric(gsub("^((\d+|\.)+).*$", "\1", `HR.(multivariable)`)),
          multivariable_lower = as.numeric(gsub("^.+? $(.+?)-.*$", "\1", `HR.(multivariable)`)),
          multivariable_upper = as.numeric(gsub("^.+?-(.+?),.*$", "\1", `HR.(multivariable)`)),
          multivariable_p = gsub("^.+?p=*(.+?)$.*$", "\1", `HR.(multivariable)`),
          group = grp) %>% 
   filter(!is.na(univariable_HR)) %>%
   select(-Level, -`HR.(multivariable)`, - `HR.(univariable)`) %>%
   tidyr::pivot_longer(cols = -(c(1:2, 11)), names_sep = "_", names_to = c("type", ".value"))
}

df <- rbind(wrangler(GRP1), wrangler(GRP2))

This now gives us the data in the correct format for plotting. Each row will become a single pointrange in our plot, so it needs a hazard ratio, a lower confidence bound, an upper confidence bound, a variable label, the type (multivariable versus univariable), and the group it originally came from (GRP1 or GRP2):

df
#> # A tibble: 20 x 8
#>    Variable    Number group type             HR lower upper p     
#>                           
#>  1 Sex Male      2318 GRP1  univariable    1.13  0.91  1.4  0.265 
#>  2 Sex Male      2318 GRP1  multivariable  1.13  0.91  1.4  0.276 
#>  3 Score 1-2     1637 GRP1  univariable    1.49  1.19  1.87 0.001 
#>  4 Score 1-2     1637 GRP1  multivariable  1.15  0.9   1.47 0.250 
#>  5 Score 3-4      412 GRP1  univariable    1.71  1.14  2.56 0.010 
#>  6 Score 3-4      412 GRP1  multivariable  1.09  0.71  1.67 0.710 
#>  7 Score >=5       42 GRP1  univariable    1.67  0.53  5.21 0.381 
#>  8 Score >=5       42 GRP1  multivariable  0.96  0.3   3.05 0.943 
#>  9 Treatment B   2951 GRP1  univariable    1.74  1.26  2.4  0.001 
#> 10 Treatment B   2951 GRP1  multivariable  1.53  1.09  2.13 0.013 
#> 11 Sex Male      2318 GRP2  univariable    1.7   1.36  2.13 <0.001
#> 12 Sex Male      2318 GRP2  multivariable  1.62  1.28  2.04 <0.001
#> 13 Score 1-2     1637 GRP2  univariable    2.76  1.21  6.29 0.016 
#> 14 Score 1-2     1637 GRP2  multivariable  2.69  1.18  6.13 0.019 
#> 15 Score 3-4      412 GRP2  univariable    5.11  2.26 11.6  <0.001
#> 16 Score 3-4      412 GRP2  multivariable  4.46  1.95 10.2  <0.001
#> 17 Score >=5       42 GRP2  univariable    5.05  2.19 11.6  <0.001
#> 18 Score >=5       42 GRP2  multivariable  4.08  1.73  9.59 0.001 
#> 19 Treatment B   2951 GRP2  univariable    1.48  1.16  1.88 0.001 
#> 20 Treatment B   2951 GRP2  multivariable  1.23  0.95  1.59 0.114

Now that we have the data in this format, the plot itself is straightforward:

library(ggplot2)

ggplot(df, aes(HR, Variable)) +
   geom_pointrange(aes(xmin = lower, xmax = upper, colour = type),
                   position = position_dodge(width = 0.5)) +
   facet_grid(group~., switch = "y") +
   geom_vline(xintercept = 0, linetype = 2) +
   theme_bw() +
   theme(strip.placement = "outside",
         strip.text= element_text(angle = 180),
         strip.background = element_blank(),
         panel.spacing = unit(0, "mm"))

^{Created on 2021-11-01 by the reprex package (v2.0.0)}

Convert cox regression table to forest plot

Answers (1)

Related Questions