Reputation: 442
I have a data-frame of 2 columns: y and x. The dimensions of the data-frame are 10000 rows and 2 columns. The 10000 rows refer to 500 samples, each with 20 y and 20 x.
How can I perform linear regression on each sample (each group of 20 rows) so that I can store the estimated coefficient in a separate 500-row data structure?
I know I can perform summary(lm(y ~ x))$coefficients[2, 1]
to get the estimated coefficients for every row in the data-frame. However, my objective is the estimated coefficients for every sample, not every row.
Upvotes: 0
Views: 191
Reputation: 206456
You can use by()
to preform regressions on different subsets if you create a column which identifies the subset to which each row belongs. First, some sample data
N<-10000
n<-20
dd<-data.frame(x=runif(N))
dd<-transform(dd, y= 4-2*x + rnorm(N))
Now, to fit the model
fits<-t(sapply(by(dd, rep(1:(N/n), each=n), function(x) lm(y~x, x)), coef))
head(fits)
# (Intercept) x
# 1 4.025626 -2.3476841
# 2 4.684731 -3.0566627
# 3 4.011690 -1.8731735
# 4 3.788382 -1.9182377
# 5 3.461123 -1.0965173
# 6 3.671282 -0.9247785
Upvotes: 1