Reputation: 79
I have a dataset that is organized as below, with data for 23 dates (shown below are data for 1 date and part of a second date - note: the headings are offset). I would like to run linear models of y~x (lm(y~x)) by FrontBack and date (i.e., 2 lm's for each date, 1 for Front and 1 for Back). I would then like to summarize the output in a matrix where the slope, intercept, and errors for each of these are listed in separate columns. This should give me a matrix with 46 rows (23 dates x 2 levels of front/back).
Date FrontBack y x
20140916 Back 2234.580 2.253175
20140916 Back 2267.631 7.725422
20140916 Back 2246.668 14.414951
20140916 Back 2216.307 17.837861
20140916 Back 2214.225 15.484364
20140916 Front 2245.522 90.062102
20140916 Back 2219.565 12.326474
20140916 Front 2267.427 63.396137
20140916 Back 2213.286 7.861758
20140916 Front 2264.902 61.661650
20140916 Front 2256.183 70.124702
20140916 Back 2202.254 7.400539
20140916 Front 2241.997 44.826769
20140916 Back 2204.868 5.739663
20140916 Back 2209.424 2.165606
20140916 Front 2266.947 1.068334
20140917 Back 2237.199 2.190785
20140917 Back 2248.541 4.415886
20140917 Back 2260.041 8.724817
20140917 Back 2277.407 13.420694
20140917 Back 2278.414 14.789667
20140917 Front 2346.622 29.878672
20140917 Back 2268.111 15.120095
20140917 Front 2496.946 60.30390
Upvotes: 1
Views: 833
Reputation: 7232
For example:
# break data frame to list of data frames
df_list <- split(df1, f = paste(df1$Date, df1$FrontBack) )
# run lm models on each data frame
models <- lapply(df_list, function(dat) {
lm(y ~ x, data = dat)
})
# format result
res <-
do.call(
rbind,
lapply(models, function(x) {
coefs <- coef(summary(x))
c( intercept = coefs["(Intercept)", "Estimate"],
slope = coefs["x", "Estimate"],
p_value = coefs["x","Pr(>|t|)"],
InterceptP = coefs["(Intercept)","Pr(>|t|)"],
StdErrorX = coefs["x","Std. Error"],
StdErrorI = coefs["(Intercept)","Std. Error"],
r2 = x$r.squared,
AIC = AIC(x)
)
})
)
Upvotes: 1
Reputation: 996
Because you only have two categories, would subsetting the front and back portions be ok? If so, the following code should give you what you were looking for:
fake.y=runif(15*5,min=2200,max=2500) #make some fake data for testing
fake.FrontBack<-sample(c("Front","Back"),15*5,replace=T)
fake.Date<-sort(sample(seq(20140916,20140920),15*5,replace=T))
fake.data<-data.frame(Date=fake.Date,FrontBack=fake.FrontBack,y=fake.y,x=fake.y/120+runif(15*5)+1*(fake.FrontBack=="Back"))
# starting here use your data instead of fake.data
f<-subset(fake.data,FrontBack=="Front") # use subset to select front/back
b<-subset(fake.data,FrontBack=="Back")
f.fit<-lm(y~x,data=f)# regress front values
f.resid<-data.frame(Date=f$Date,resid=f.fit$residuals,int=f.fit$coefficients[1],slope=f.fit$coefficients[2],FrontBack="Front") # make dataframe of residuals
b.fit<-lm(y~x,data=b)# regress back values
b.resid<-data.frame(Date=b$Date,resid=b.fit$residuals,int=b.fit$coefficients[1],slope=b.fit$coefficients[2],FrontBack="Back") # make dataframe of residuals
all.resid<-rbind(f.resid,b.resid) # stick 'em together
head(all.resid) # should be what you wanted, residual error for all entires
# could be what you wanted, aggregate mean error for each date, front and back
ag<-aggregate(resid~.,data=all.resid,mean)
print(ag)
the code will spit out something like this, to show the model error residual for each row in your original data:
Date resid int slope FrontBack
1 20140916 -47.91676 393.5386 97.52173 Front
3 20140916 52.01027 393.5386 97.52173 Front
4 20140916 52.58631 393.5386 97.52173 Front
5 20140916 -17.56038 393.5386 97.52173 Front
6 20140916 -21.85633 393.5386 97.52173 Front
9 20140916 44.97382 393.5386 97.52173 Front
If you wanted the average error for each day, the aggregated data will be:
Date int slope FrontBack resid
1 20140916 393.5386 97.52173 Front 10.372821
2 20140917 393.5386 97.52173 Front -3.840699
3 20140918 393.5386 97.52173 Front -10.876092
4 20140919 393.5386 97.52173 Front -2.159973
5 20140920 393.5386 97.52173 Front 8.878526
6 20140916 212.6598 101.60367 Back 2.862525
7 20140917 212.6598 101.60367 Back -14.476662
8 20140918 212.6598 101.60367 Back 10.822712
9 20140919 212.6598 101.60367 Back -10.072473
10 20140920 212.6598 101.60367 Back 10.472516
With a separate front and back entries for each date, the subsetting and regression should work in the same for your data as in this example
Upvotes: 1