zhaoy
zhaoy

Reputation: 442

linear regression on equal-size groups of rows in data-frame

I have a data-frame of 2 columns: y and x. The dimensions of the data-frame are 10000 rows and 2 columns. The 10000 rows refer to 500 samples, each with 20 y and 20 x.

How can I perform linear regression on each sample (each group of 20 rows) so that I can store the estimated coefficient in a separate 500-row data structure?

I know I can perform summary(lm(y ~ x))$coefficients[2, 1] to get the estimated coefficients for every row in the data-frame. However, my objective is the estimated coefficients for every sample, not every row.

Upvotes: 0

Views: 191

Answers (1)

MrFlick
MrFlick

Reputation: 206456

You can use by() to preform regressions on different subsets if you create a column which identifies the subset to which each row belongs. First, some sample data

N<-10000
n<-20
dd<-data.frame(x=runif(N))
dd<-transform(dd, y= 4-2*x + rnorm(N))

Now, to fit the model

fits<-t(sapply(by(dd, rep(1:(N/n), each=n), function(x) lm(y~x, x)), coef))
head(fits)
#   (Intercept)          x
# 1    4.025626 -2.3476841
# 2    4.684731 -3.0566627
# 3    4.011690 -1.8731735
# 4    3.788382 -1.9182377
# 5    3.461123 -1.0965173
# 6    3.671282 -0.9247785

Upvotes: 1

Related Questions