Reputation: 2319

Running a regression on a subset of observations using R

let's say that I have two variables A:{1,2,3,4,5,6,7,8,9,10} and B:{11,12,13,14,15,16,17,18,19,20} and I want to run a regression in R, but using only the observations that have A>6, i.e. to run the regression using {7,8,9,10} and {17, 18,19,20}.

In STATA it is easy to do it: reg A B if A>6, but in R I cannot find an easy way to do it (I use the lm command).

Please notice that I am new in R and I can use only vanilla R, I am not allowed to install any package. Thanks in advance.

Upvotes: 0

Answers (3)

Dalton Hance

Reputation: 244

It's probably best to make sure your variables are stored together in the same object and probably best that that object is a data frame. This way you can more generally extend to multiple regression and if you for some reason reorder the data this reorganization will extend to all the variables. When you subset, it will also extend to all your variables.

So to answer your question:

df = data.frame(A = c(1:10), B = c(11:20))
lm(A ~ B, data = df[df$A>6,])

or using the subset function:

lm(A ~ B, data = subset(df, A > 6))

Upvotes: 1

Pierre L

Reputation: 28441

You can subset using the conditional index

lm(A[A>6] ~ B[A>6])

Upvotes: 1

Mamoun Benghezal

Reputation: 5314

you can use the subset parameter like this

lm(A ~ B, subset = A > 6 )

Upvotes: 3

Running a regression on a subset of observations using R

Answers (3)

Related Questions