Carter
Carter

Reputation: 179

Appending results of dlply function to original table

This question builds on the answer that Simon and James provided here

The dlply function worked well to give me Y estimates within my data subsets. Now, my challenge is getting these Y estimates and residuals back into the original data frame to calculate goodness of fit statistics and for further analysis.

I was able to use cbind to convert the dlply output lists to row vectors, but this doesn't quite work as the result is (sorry about the poor markdown).

model <- function(df){ glm(Y~D+O+A+log(M), family=poisson(link="log"), data=df)}
Modrpt <- ddply(msadata, "Dmsa", function(x)coef(model(x)))
Modest <- cbind(dlply(msadata, "Dmsa", function(x) fitted.values(model(x))))

Subset name | Y_Estimates
-------------------------
Dmsa 1      | c(4353.234, 234.34,...
Dmsa 2      | c(998.234, 2543.55,...

This doesn't really answer the mail, because I need to get the individual Y estimates (separated by commas in the Y_estimates column of the Modest data frame) into my msadata data frame.

Ideally, and I know this is incorrect, but I'll put it here for an example, I'd like to do something like this:

msadata$Y_est <- cbind(dlply(msadata, "Dmsa", function(x)fitted.values(model(x))))

If I can decompose the list into individual Y estimates, I could join this to my msadata data frame by "Dmsa". I feel like this is very similar to Michael's answer here, but something is needed to separate the list elements prior to employing Michael's suggestion of join() or merge(). Any ideas?

Upvotes: 2

Views: 373

Answers (1)

agstudy
agstudy

Reputation: 121626

In the previous question , I proposed a data.table solution. I think it is more appropriate to what you want to do, since you want to apply models by group then aggregate the results with the original data.

library(data.table)
DT <- as.data.table(df)
models <- DT[,{
                mod= glm(Y~D+O+A+log(M), family=poisson(link="log"))
                data.frame(res= mod$residuals,
                           fit=mod$fitted.values,
                           mod$model)
               },                          
                by = Dmsa]

Here an application with some data:

## create some data
set.seed(1)
d.AD <- data.frame(
counts = sample(c(10:30),18,rep=TRUE),
outcome = gl(3,1,18),
treatment = gl(3,6),
type = sample(c(1,2),18,rep=TRUE) ) ## type is the grouping variable
## corece data to a data.table        
library(data.table)
DT <- as.data.table(d.AD)
## apply models
DT[,{mod= glm(formula = counts ~ outcome + treatment, 
                              family = poisson())
               data.frame(res= mod$residuals,
                          fit=mod$fitted.values,
               mod$model)},                          
                     by = type]

   type           res      fit counts outcome treatment
 1:    1 -3.550408e-01 23.25729     15       1         1
 2:    1  2.469211e-01 23.25729     29       1         1
 3:    1  9.866698e-02 25.48543     28       3         1
 4:    1  5.994295e-01 18.13147     29       1         2
 5:    1  4.633974e-16 23.00000     23       2         2
 6:    1  1.576093e-01 19.86853     23       3         2
 7:    1 -3.933199e-01 18.13147     11       1         2
 8:    1 -3.456991e-01 19.86853     13       3         2
 9:    1  6.141856e-02 22.61125     24       1         3
10:    1  4.933908e-02 24.77750     26       3         3
11:    1 -1.154845e-01 22.61125     20       1         3
12:    2  9.229985e-02 15.56349     17       1         1
13:    2  5.805515e-03 21.87302     22       2         1
14:    2 -1.004589e-01 15.56349     14       1         1
15:    2  2.537653e-16 14.00000     14       1         2
16:    2 -1.603110e-01 21.43651     18       1         3
17:    2  1.662347e-01 21.43651     25       1         3
18:    2 -4.214963e-03 30.12698     30       2         3

Upvotes: 2

Related Questions