Reputation: 21
I am fairly new to R and I want to do multivariate and univariate analysis of my dataset
I have 32 dependent variables and 4 Independent variables all arranged in columns. I want to do multivariate analysis that tells me which of the independent variables is significantly impacting my proteins. Then I want to do univariate analysis on each protein to see which IV is significant for each protein. I want this to be similar to multivariate analysis in SPSS something like:
model <- glm(df[,c(9:40]~df[,c(4:8)], family="poisson", data=df)
Is there a way to do this in R?
This is what I want to do for the univariate analysis
model_univariate <- glm(Protein1 ~ age + bmi+ gender+ group, family="poisson", data=data)
I tried to do a for loop to try so I didn't have to input the formula one by one for each protein but I keep getting an error
IV <- df[,c(4:8)]
DV <-df[,c(9:40)]
for (y in DV){
form <- formula(paste(y, "~", IV))
models[[y]] <- glm(form, data = df, family="poisson")
}
error message: Error in terms.formula(formula, data = data) : invalid term in model formula
Upvotes: 1
Views: 3674
Reputation: 46978
Please provide an exampled dataset in the future. Let's say we have something like this:
df = data.frame(matrix(rpois(4000,20),100,40))
colnames(df)[1:4] = c("c1","c2","c3","c4")
colnames(df)[5:8] = paste0("IV",1:4)
colnames(df)[9:40] = paste0("Protein",1:32)
Define DV and IV:
DV = colnames(df)[9:40]
IV = colnames(df)[5:8]
You have not defined multivariate response properly. If the intention is to fit all the responses at one go, but keep each response separate, the multi-response model you have only works for lm
, not for glm
which you need for the poisson.
If you want to to model all the responses, considering the relationship between them, the packages suggested by @eipi10 is indeed the way to go.
Univariate:
models = vector("list",length(DV))
names(models) = DV
for (y in DV){
form <- reformulate(response=y,IV)
models[[y]] <- glm(form, data = df, family="poisson")
}
models[["Protein1"]]
Call: glm(formula = form, family = "poisson", data = df)
Coefficients:
(Intercept) IV1 IV2 IV3 IV4
3.3793743 -0.0063736 -0.0094023 -0.0026646 -0.0007955
Degrees of Freedom: 99 Total (i.e. Null); 95 Residual
Null Deviance: 123.4
Residual Deviance: 118.4 AIC: 609.6
Upvotes: 1