Reputation: 13
I have collected non-tidy data from different studies. Assume this data is about studies that report on a number patients having treatment_x and with a treatment outcome in percentages and an x number of recurrences of the treated disease.
library("tidyverse", "gtsummary")
data <- data.frame(
study_id = c(1, 2, 3, 4, 5),
no_patients = c(10, 15, 20, 23, 16),
treatment_id = c("surgery", "radiotherapy", "surgery", "radiotherapy", "surgery"),
treatment_outcome = c(0.88, 0.50, 0.90, 0.23, 0.67),
recurrence = c(0, 2, 4, 3, 6)
)
I want to report this data in a table a compare the different treatment methods
data %>%
select(-study_id) %>%
tbl_summary(by = treatment_id,
type = list(no_patients ~ "continuous", recurrence ~ "continuous", treatment_outcome ~ "continuous"),
statistic = list(
c("no_patients") ~ "{sum}"
)) %>%
add_p()
As the different studies report on different no of patients, they are not of equal importance and study 4 with 23 patients be counted heavier than the other studies with a lower amount of patients.
I could make this table a long format:
data.long <- data[rep(row.names(data), data$no_patients), ]
Now the data has a row per patient, however the total amount of recurrences are attributed to each individual patient. I could also divide the column with recurrences by the amount of patients. However, my actual dataset is way more complicated and has a much higher number of variables.
My questions:
Upvotes: 1
Views: 86
Reputation: 6911
You could group_by
treatment_id and use prop.table
to obtain groupwise weights:
library(dplyr)
data |>
group_by(treatment_id) |>
mutate(treatment_outcome_weighted = treatment_outcome *
prop.table(no_patients),
)
Upvotes: 1