clumbzy1
clumbzy1

Reputation: 105

Running a regression

Background: my data set has 52 rows and 12 columns (assume column names are A - L) and the name of my data set is foo

I am told to run a regression where foo$L is the dependent variable, and all other variables are independent except for foo$K.

The way i was doing it is

fit <- lm(foo$L ~ foo$a + ... +foo$J) 

then calling

summary(fit)

Is my way a good way to run a regression and finding the intercept and coef?

Upvotes: 0

Views: 156

Answers (2)

Marius
Marius

Reputation: 60070

Use the data argument to lm so you don't have to use the foo$ syntax for each predictor. Use dependent ~ . as the formula to have the dependent variable predicted by all other variables. Then you can use - K to exclude K:

data_mat = matrix(rnorm(52 * 12), nrow = 52)

df = as.data.frame(data_mat)
colnames(df) = LETTERS[1:12]

lm(L ~ . - K, data = df)

Upvotes: 3

www
www

Reputation: 39154

You can first remove the column K, and then do fit <- lm(L ~ ., data = foo). This will treat the L column as the dependent variable and all the other columns as the independent variables. You don't have to specify each column names in the formula.

Here is an example using the mtcars, fitting a multiple regression model to mpg with all the other variables except carb.

mtcars2 <- mtcars[, !names(mtcars) %in% "carb"]

fit <- lm(mpg ~ ., data = mtcars2)

summary(fit)

# Call:
#   lm(formula = mpg ~ ., data = mtcars2)
# 
# Residuals:
#   Min      1Q  Median      3Q     Max 
# -3.3038 -1.6964 -0.1796  1.1802  4.7245 
# 
# Coefficients:
#   Estimate Std. Error t value Pr(>|t|)   
# (Intercept) 12.83084   18.18671   0.706  0.48790   
# cyl         -0.16881    0.99544  -0.170  0.86689   
# disp         0.01623    0.01290   1.259  0.22137   
# hp          -0.02424    0.01811  -1.339  0.19428   
# drat         0.70590    1.56553   0.451  0.65647   
# wt          -4.03214    1.33252  -3.026  0.00621 **
# qsec         0.86829    0.68874   1.261  0.22063   
# vs           0.36470    2.05009   0.178  0.86043   
# am           2.55093    2.00826   1.270  0.21728   
# gear         0.50294    1.32287   0.380  0.70745   
# ---
#   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 2.593 on 22 degrees of freedom
# Multiple R-squared:  0.8687,  Adjusted R-squared:  0.8149 
# F-statistic: 16.17 on 9 and 22 DF,  p-value: 9.244e-08

Upvotes: 0

Related Questions