How to use broom::tidy with multiple models?

Question

I'm trying to summarize the results of 19 polynomial regression models by using broom. I've followed this SO Question and am trying to use it with broom::tidy. My script is as follows:

ALTER PROCEDURE [dbo].[spRegressionPeak]
@StudyID int
AS
BEGIN
Declare @sStudyID VARCHAR(50)
Set @sStudyID = CONVERT(VARCHAR(50),@StudyID)

--We are selecting the distinct StudyID, Productnumber, ResponseID and mean 
values 1 thorugh 6 from the CodeMeans table.  
--Note that spCodeMeans must be run before running this stored procedure to 
ensure response data exists in the CodeMeans table.
--We use IsNull values to pass zeroes where an average wasn't calculated os that 
the polynomial regression can be calculated.
DECLARE @inquery  AS NVARCHAR(MAX) = '
    Select
c.StudyID, c.RespID, c.LikingOrder, avg(isnull(C1,0)) as C1, avg(isnull(C2,0)) as C2, avg(isnull(C3,0)) as C3, avg(isnull(C4,0)) as C4, 
avg(isnull(C5,0)) as C5, avg(isnull(C6,0)) as C6, avg(isnull(C7,0)) as C7, avg(isnull(C8,0)) as C8, avg(isnull(C9,0)) as C9, 
avg(isnull(C10,0)) as C10, avg(isnull(C11,0)) as C11, avg(isnull(C12,0)) as C12, avg(isnull(C13,0)) as C13, avg(isnull(C14,0)) as C14, 
avg(isnull(C15,0)) as C15, avg(isnull(C16,0)) as C16, avg(isnull(C17,0)) as C17, avg(isnull(C18,0)) as C18, avg(isnull(C19,0)) as C19
from ClosedStudyResponses c
where c.StudyID = @StudyID
group by StudyID, RespID, LikingOrder
order by RespID
        '


--We are setting @inquery aka InputDataSet to be our initial dataset.  
--R Services requires that a data.frame be passed to any calculations being 
generated.  As such, df is simply data framing the @inquery data.
--The res object holds the polynomial regression results by RespondentID and 
LikingOrder for each of the averages in the @inquery resultset.

EXEC sp_execute_external_script @language = N'R'
    , @script = N'
    library(tidyr, broom)

    studymeans <- InputDataSet

    df <- data.frame(studymeans) 

    lin.mod.1 <- lm(df$LikingOrder ~ poly(df$C1,3, raw=TRUE))
    lin.mod.2 <- lm(df$LikingOrder ~ poly(df$C2,3, raw=TRUE))
    lin.mod.3 <- lm(df$LikingOrder ~ poly(df$C3,3, raw=TRUE))
    lin.mod.4 <- lm(df$LikingOrder ~ poly(df$C4,3, raw=TRUE))
    lin.mod.5 <- lm(df$LikingOrder ~ poly(df$C5,3, raw=TRUE))
    lin.mod.6 <- lm(df$LikingOrder ~ poly(df$C6,3, raw=TRUE))
    lin.mod.7 <- lm(df$LikingOrder ~ poly(df$C7,3, raw=TRUE))
    lin.mod.8 <- lm(df$LikingOrder ~ poly(df$C8,3, raw=TRUE))
    lin.mod.9 <- lm(df$LikingOrder ~ poly(df$C9,3, raw=TRUE))
    lin.mod.10 <- lm(df$LikingOrder ~ poly(df$C10,3, raw=TRUE))
    lin.mod.11 <- lm(df$LikingOrder ~ poly(df$C11,3, raw=TRUE))
    lin.mod.12 <- lm(df$LikingOrder ~ poly(df$C12,3, raw=TRUE))
    lin.mod.13 <- lm(df$LikingOrder ~ poly(df$C13,3, raw=TRUE))
    lin.mod.14 <- lm(df$LikingOrder ~ poly(df$C14,3, raw=TRUE))
    lin.mod.15 <- lm(df$LikingOrder ~ poly(df$C15,3, raw=TRUE))
    lin.mod.16 <- lm(df$LikingOrder ~ poly(df$C16,3, raw=TRUE))
    lin.mod.17 <- lm(df$LikingOrder ~ poly(df$C17,3, raw=TRUE))
    lin.mod.18 <- lm(df$LikingOrder ~ poly(df$C18,3, raw=TRUE))
    lin.mod.19 <- lm(df$LikingOrder ~ poly(df$C19,3, raw=TRUE))

    lst <- lapply(ls(pattern="lin.mod"), get)
    allmodels <- lapply(lst, summary)

    res <- broom::tidy(allmodels)
'
, @input_data_1 = @inquery
, @output_data_1_name = N'res'
, @params = N'@StudyID int'
,@StudyID = @StudyID 
--- Edit this line to handle the output data frame.
--WITH RESULT SETS ((StudyID int, RespID int, LikingOrder int, NewColumn int, 
res varchar(max)));
END;

The above script throws the following error when a valid StudyID input parameter is passed to it:

Error in setNames(data.frame(data), value.name) : 
'names' attribute [1] must be the same length as the vector [0]
Calls: source ...  ->  -> melt.default -> setNames
In addition: There were 50 or more warnings (use warnings() to see the first 
50)

My input data is as follows: The desired outcome is to obtain the summary of all 19 models in a data.frame. How do I resolve the error and modify my code to accomplish the end result?

Calum You · Accepted Answer

Without your working environment I am not sure exactly how your data is set up, but it seems like you are trying to fit a model with the same dependent variable on a number of predictor columns. I think the missing piece is a call to rowwise as per the broom and dplyr vignette, but not totally sure. Nevertheless, here is a working example with the mtcars dataset. Note that the structure is to use tidy on a rowwise dataframe with a list-column containing the models, rather than directly on a list. You can also directly create the models by mapping over the dataframe containing predictor columns, rather than the mess of storing models in the environment and needing to use get and ls. Any time you find yourself using ls, think whether you can put your elements in a list!

Edit: After looking over the vignette again prompted by this question, I realised you can actually just do a quick pipe as now shown (see edit history for a method using enframe. By gathering the data into a format that suits grouped model fitting, you can get the desired results much more tidily!

library(tidyverse)
library(broom)

mtcars %>%
  gather(predictor, measure, -mpg) %>%
  group_by(predictor) %>%
  do(tidy(lm(mpg ~ measure, .)))
#> # A tibble: 20 x 6
#> # Groups:   predictor [10]
#>    predictor term        estimate std.error statistic  p.value
#>                                 
#>  1 am        (Intercept)  17.1      1.12       15.2   1.13e-15
#>  2 am        measure       7.24     1.76        4.11  2.85e- 4
#>  3 carb      (Intercept)  25.9      1.84       14.1   9.22e-15
#>  4 carb      measure      -2.06     0.569      -3.62  1.08e- 3
#>  5 cyl       (Intercept)  37.9      2.07       18.3   8.37e-18
#>  6 cyl       measure      -2.88     0.322      -8.92  6.11e-10
#>  7 disp      (Intercept)  29.6      1.23       24.1   3.58e-21
#>  8 disp      measure      -0.0412   0.00471    -8.75  9.38e-10
#>  9 drat      (Intercept)  -7.52     5.48       -1.37  1.80e- 1
#> 10 drat      measure       7.68     1.51        5.10  1.78e- 5
#> 11 gear      (Intercept)   5.62     4.92        1.14  2.62e- 1
#> 12 gear      measure       3.92     1.31        3.00  5.40e- 3
#> 13 hp        (Intercept)  30.1      1.63       18.4   6.64e-18
#> 14 hp        measure      -0.0682   0.0101     -6.74  1.79e- 7
#> 15 qsec      (Intercept)  -5.11    10.0        -0.510 6.14e- 1
#> 16 qsec      measure       1.41     0.559       2.53  1.71e- 2
#> 17 vs        (Intercept)  16.6      1.08       15.4   8.85e-16
#> 18 vs        measure       7.94     1.63        4.86  3.42e- 5
#> 19 wt        (Intercept)  37.3      1.88       19.9   8.24e-19
#> 20 wt        measure      -5.34     0.559      -9.56  1.29e-10

Created on 2018-07-10 by the reprex package (v0.2.0).

How to use broom::tidy with multiple models?

Answers (2)

Related Questions