RRookie
RRookie

Reputation: 21

Is there a way to perform multivariate glm in R (Multiple DV and Multiple IV)?

I am fairly new to R and I want to do multivariate and univariate analysis of my dataset

Dataset example:
enter image description here

I have 32 dependent variables and 4 Independent variables all arranged in columns. I want to do multivariate analysis that tells me which of the independent variables is significantly impacting my proteins. Then I want to do univariate analysis on each protein to see which IV is significant for each protein. I want this to be similar to multivariate analysis in SPSS something like:

model <- glm(df[,c(9:40]~df[,c(4:8)], family="poisson", data=df)

Is there a way to do this in R?

This is what I want to do for the univariate analysis

model_univariate <- glm(Protein1 ~ age + bmi+ gender+ group, family="poisson", data=data)

I tried to do a for loop to try so I didn't have to input the formula one by one for each protein but I keep getting an error

IV <- df[,c(4:8)]

DV <-df[,c(9:40)]

for (y in DV){
form <- formula(paste(y, "~", IV))
models[[y]] <- glm(form, data = df, family="poisson") 
}

error message: Error in terms.formula(formula, data = data) : invalid term in model formula

Upvotes: 1

Views: 3674

Answers (1)

StupidWolf
StupidWolf

Reputation: 46978

Please provide an exampled dataset in the future. Let's say we have something like this:

df = data.frame(matrix(rpois(4000,20),100,40))
colnames(df)[1:4] = c("c1","c2","c3","c4")
colnames(df)[5:8] = paste0("IV",1:4)
colnames(df)[9:40] = paste0("Protein",1:32)

Define DV and IV:

DV = colnames(df)[9:40]
IV = colnames(df)[5:8]

You have not defined multivariate response properly. If the intention is to fit all the responses at one go, but keep each response separate, the multi-response model you have only works for lm, not for glm which you need for the poisson.

If you want to to model all the responses, considering the relationship between them, the packages suggested by @eipi10 is indeed the way to go.

Univariate:

models = vector("list",length(DV))
names(models) = DV

for (y in DV){
form <- reformulate(response=y,IV)
models[[y]] <- glm(form, data = df, family="poisson") 
}

models[["Protein1"]]

Call:  glm(formula = form, family = "poisson", data = df)

Coefficients:
(Intercept)          IV1          IV2          IV3          IV4  
  3.3793743   -0.0063736   -0.0094023   -0.0026646   -0.0007955  

Degrees of Freedom: 99 Total (i.e. Null);  95 Residual
Null Deviance:      123.4 
Residual Deviance: 118.4    AIC: 609.6

Upvotes: 1

Related Questions