Reputation: 1
I have data on the effect sizes for 14 variables (var1-var14). Each value is the effect size of a specific treatment on a certain variable. Missing values are due to that some articles did not consider certain variables. A positive value show promoting while a negative value shows the inhibiting effect of that treatment on the variable. I want (1) to do a pairwise linear regression that runs through each and every variable and compare if there is an association between variables, (2) consider var1 as the dependent variable and var2-var14 all as independent variables to find the best-fit model (maybe using glmulti
package?) and show changes in which variables are most important for change in var1
.
Here is a sample data:
set.seed(123)
**# Create the dataset with effect sizes and missing values**
mydata <- data.frame(
Var1 = sample(c(-20:14, NA), 64, replace = TRUE),
Var2 = sample(c(-20:14, NA), 64, replace = TRUE),
Var3 = sample(c(-20:14, NA), 64, replace = TRUE),
Var4 = sample(c(-20:14, NA), 64, replace = TRUE),
Var5 = sample(c(-20:14, NA), 64, replace = TRUE),
Var6 = sample(c(-20:14, NA), 64, replace = TRUE),
Var7 = sample(c(-20:14, NA), 64, replace = TRUE),
Var8 = sample(c(-20:14, NA), 64, replace = TRUE),
Var9 = sample(c(-20:14, NA), 64, replace = TRUE),
Var10 = sample(c(-20:14, NA), 64, replace = TRUE),
Var11 = sample(c(-20:14, NA), 64, replace = TRUE),
Var12 = sample(c(-20:14, NA), 64, replace = TRUE),
Var13 = sample(c(-20:14, NA), 64, replace = TRUE),
Var14 = sample(c(-20:14, NA), 64, replace = TRUE)
)
**# Set more than 50% missing values in each column**
for (col in 1:14) {
missing_indices <- sample(1:64, size = 32)
mydata[missing_indices, col] <- NA
}
Is it possible to do all this with such dataset (i.e., missing values)? Thanks!
Upvotes: -1
Views: 75
Reputation: 6921
d
being your example data:
d <-
paste0('Var_', 1:14) |>
Map(f = \(.) sample(c(-20:14, NA),
size = 64,
prob = c(rep(.49/35, 35), .51),
replace = TRUE
)
) |>
as.data.frame()
... you get the pairwise associations in terms of the correlation matrix like so:
d |> cor(use = 'pairwise.complete.obs')
... and a basic column-wise imputation (replacing NA
with the mean value) this way:
d_imputed <- d |>
apply(2, \(var) replace(var, is.na(var), mean(var, na.rm = TRUE)))
Finally you can obtain the regression coefficients of the predictors (columns) for each column like so:
d_imputed |>
apply(2, FUN = \(var) coef(lm(var ~ ., as.data.frame(d_imputed))))
A word of caution: above is just a technical answer to your literal question. For a statistically sound solution, I'd recommend researching over at Cross Validated about imputation, dimensionality reduction, predictor selection and such (see Ben Bolker's comment).
Upvotes: 1