Reputation: 2319
let's say that I have two variables A:{1,2,3,4,5,6,7,8,9,10} and B:{11,12,13,14,15,16,17,18,19,20} and I want to run a regression in R, but using only the observations that have A>6, i.e. to run the regression using {7,8,9,10} and {17, 18,19,20}.
In STATA it is easy to do it: reg A B if A>6, but in R I cannot find an easy way to do it (I use the lm command).
Please notice that I am new in R and I can use only vanilla R, I am not allowed to install any package. Thanks in advance.
Upvotes: 0
Views: 5912
Reputation: 244
It's probably best to make sure your variables are stored together in the same object and probably best that that object is a data frame. This way you can more generally extend to multiple regression and if you for some reason reorder the data this reorganization will extend to all the variables. When you subset, it will also extend to all your variables.
So to answer your question:
df = data.frame(A = c(1:10), B = c(11:20))
lm(A ~ B, data = df[df$A>6,])
or using the subset
function:
lm(A ~ B, data = subset(df, A > 6))
Upvotes: 1
Reputation: 5314
you can use the subset
parameter like this
lm(A ~ B, subset = A > 6 )
Upvotes: 3