Reputation: 179
This question builds on the answer that Simon and James provided here
The dlply
function worked well to give me Y estimates within my data subsets. Now, my challenge is getting these Y estimates and residuals back into the original data frame to calculate goodness of fit statistics and for further analysis.
I was able to use cbind
to convert the dlply
output lists to row vectors, but this doesn't quite work as the result is (sorry about the poor markdown).
model <- function(df){ glm(Y~D+O+A+log(M), family=poisson(link="log"), data=df)}
Modrpt <- ddply(msadata, "Dmsa", function(x)coef(model(x)))
Modest <- cbind(dlply(msadata, "Dmsa", function(x) fitted.values(model(x))))
Subset name | Y_Estimates
-------------------------
Dmsa 1 | c(4353.234, 234.34,...
Dmsa 2 | c(998.234, 2543.55,...
This doesn't really answer the mail, because I need to get the individual Y estimates (separated by commas in the Y_estimates column of the Modest
data frame) into my msadata
data frame.
Ideally, and I know this is incorrect, but I'll put it here for an example, I'd like to do something like this:
msadata$Y_est <- cbind(dlply(msadata, "Dmsa", function(x)fitted.values(model(x))))
If I can decompose the list into individual Y estimates, I could join this to my msadata
data frame by "Dmsa"
. I feel like this is very similar to Michael's answer here, but something is needed to separate the list elements prior to employing Michael's suggestion of join()
or merge()
. Any ideas?
Upvotes: 2
Views: 373
Reputation: 121626
In the previous question , I proposed a data.table
solution. I think it is more appropriate to what you want to do, since you want to apply models by group then aggregate the results with the original data.
library(data.table)
DT <- as.data.table(df)
models <- DT[,{
mod= glm(Y~D+O+A+log(M), family=poisson(link="log"))
data.frame(res= mod$residuals,
fit=mod$fitted.values,
mod$model)
},
by = Dmsa]
Here an application with some data:
## create some data
set.seed(1)
d.AD <- data.frame(
counts = sample(c(10:30),18,rep=TRUE),
outcome = gl(3,1,18),
treatment = gl(3,6),
type = sample(c(1,2),18,rep=TRUE) ) ## type is the grouping variable
## corece data to a data.table
library(data.table)
DT <- as.data.table(d.AD)
## apply models
DT[,{mod= glm(formula = counts ~ outcome + treatment,
family = poisson())
data.frame(res= mod$residuals,
fit=mod$fitted.values,
mod$model)},
by = type]
type res fit counts outcome treatment
1: 1 -3.550408e-01 23.25729 15 1 1
2: 1 2.469211e-01 23.25729 29 1 1
3: 1 9.866698e-02 25.48543 28 3 1
4: 1 5.994295e-01 18.13147 29 1 2
5: 1 4.633974e-16 23.00000 23 2 2
6: 1 1.576093e-01 19.86853 23 3 2
7: 1 -3.933199e-01 18.13147 11 1 2
8: 1 -3.456991e-01 19.86853 13 3 2
9: 1 6.141856e-02 22.61125 24 1 3
10: 1 4.933908e-02 24.77750 26 3 3
11: 1 -1.154845e-01 22.61125 20 1 3
12: 2 9.229985e-02 15.56349 17 1 1
13: 2 5.805515e-03 21.87302 22 2 1
14: 2 -1.004589e-01 15.56349 14 1 1
15: 2 2.537653e-16 14.00000 14 1 2
16: 2 -1.603110e-01 21.43651 18 1 3
17: 2 1.662347e-01 21.43651 25 1 3
18: 2 -4.214963e-03 30.12698 30 2 3
Upvotes: 2