SidC
SidC

Reputation: 3213

How to use broom::tidy with multiple models?

I'm trying to summarize the results of 19 polynomial regression models by using broom. I've followed this SO Question and am trying to use it with broom::tidy. My script is as follows:

ALTER PROCEDURE [dbo].[spRegressionPeak]
@StudyID int
AS
BEGIN
Declare @sStudyID VARCHAR(50)
Set @sStudyID = CONVERT(VARCHAR(50),@StudyID)

--We are selecting the distinct StudyID, Productnumber, ResponseID and mean 
values 1 thorugh 6 from the CodeMeans table.  
--Note that spCodeMeans must be run before running this stored procedure to 
ensure response data exists in the CodeMeans table.
--We use IsNull values to pass zeroes where an average wasn't calculated os that 
the polynomial regression can be calculated.
DECLARE @inquery  AS NVARCHAR(MAX) = '
    Select
c.StudyID, c.RespID, c.LikingOrder, avg(isnull(C1,0)) as C1, avg(isnull(C2,0)) as C2, avg(isnull(C3,0)) as C3, avg(isnull(C4,0)) as C4, 
avg(isnull(C5,0)) as C5, avg(isnull(C6,0)) as C6, avg(isnull(C7,0)) as C7, avg(isnull(C8,0)) as C8, avg(isnull(C9,0)) as C9, 
avg(isnull(C10,0)) as C10, avg(isnull(C11,0)) as C11, avg(isnull(C12,0)) as C12, avg(isnull(C13,0)) as C13, avg(isnull(C14,0)) as C14, 
avg(isnull(C15,0)) as C15, avg(isnull(C16,0)) as C16, avg(isnull(C17,0)) as C17, avg(isnull(C18,0)) as C18, avg(isnull(C19,0)) as C19
from ClosedStudyResponses c
where c.StudyID = @StudyID
group by StudyID, RespID, LikingOrder
order by RespID
        '


--We are setting @inquery aka InputDataSet to be our initial dataset.  
--R Services requires that a data.frame be passed to any calculations being 
generated.  As such, df is simply data framing the @inquery data.
--The res object holds the polynomial regression results by RespondentID and 
LikingOrder for each of the averages in the @inquery resultset.

EXEC sp_execute_external_script @language = N'R'
    , @script = N'
    library(tidyr, broom)

    studymeans <- InputDataSet

    df <- data.frame(studymeans) 

    lin.mod.1 <- lm(df$LikingOrder ~ poly(df$C1,3, raw=TRUE))
    lin.mod.2 <- lm(df$LikingOrder ~ poly(df$C2,3, raw=TRUE))
    lin.mod.3 <- lm(df$LikingOrder ~ poly(df$C3,3, raw=TRUE))
    lin.mod.4 <- lm(df$LikingOrder ~ poly(df$C4,3, raw=TRUE))
    lin.mod.5 <- lm(df$LikingOrder ~ poly(df$C5,3, raw=TRUE))
    lin.mod.6 <- lm(df$LikingOrder ~ poly(df$C6,3, raw=TRUE))
    lin.mod.7 <- lm(df$LikingOrder ~ poly(df$C7,3, raw=TRUE))
    lin.mod.8 <- lm(df$LikingOrder ~ poly(df$C8,3, raw=TRUE))
    lin.mod.9 <- lm(df$LikingOrder ~ poly(df$C9,3, raw=TRUE))
    lin.mod.10 <- lm(df$LikingOrder ~ poly(df$C10,3, raw=TRUE))
    lin.mod.11 <- lm(df$LikingOrder ~ poly(df$C11,3, raw=TRUE))
    lin.mod.12 <- lm(df$LikingOrder ~ poly(df$C12,3, raw=TRUE))
    lin.mod.13 <- lm(df$LikingOrder ~ poly(df$C13,3, raw=TRUE))
    lin.mod.14 <- lm(df$LikingOrder ~ poly(df$C14,3, raw=TRUE))
    lin.mod.15 <- lm(df$LikingOrder ~ poly(df$C15,3, raw=TRUE))
    lin.mod.16 <- lm(df$LikingOrder ~ poly(df$C16,3, raw=TRUE))
    lin.mod.17 <- lm(df$LikingOrder ~ poly(df$C17,3, raw=TRUE))
    lin.mod.18 <- lm(df$LikingOrder ~ poly(df$C18,3, raw=TRUE))
    lin.mod.19 <- lm(df$LikingOrder ~ poly(df$C19,3, raw=TRUE))

    lst <- lapply(ls(pattern="lin.mod"), get)
    allmodels <- lapply(lst, summary)

    res <- broom::tidy(allmodels)
'
, @input_data_1 = @inquery
, @output_data_1_name = N'res'
, @params = N'@StudyID int'
,@StudyID = @StudyID 
--- Edit this line to handle the output data frame.
--WITH RESULT SETS ((StudyID int, RespID int, LikingOrder int, NewColumn int, 
res varchar(max)));
END;

The above script throws the following error when a valid StudyID input parameter is passed to it:

Error in setNames(data.frame(data), value.name) : 
'names' attribute [1] must be the same length as the vector [0]
Calls: source ... <Anonymous> -> <Anonymous> -> melt.default -> setNames
In addition: There were 50 or more warnings (use warnings() to see the first 
50)

My input data is as follows: enter image description here The desired outcome is to obtain the summary of all 19 models in a data.frame. How do I resolve the error and modify my code to accomplish the end result?

Upvotes: 0

Views: 2453

Answers (2)

Ben Bolker
Ben Bolker

Reputation: 226801

You haven't given us a reproducible example; here's something that seems to work. A few potential issues: you need to run tidy on models, not summaries; it's better to avoid $-indexing in model formulas.

library(purrr)
df <- mtcars
predvars <- colnames(mtcars)[-1]

... this would be paste0("C",1:19) in your case ...

respvar <- "mpg"  ## would be "LikingOrder"
predpolys <- sprintf("poly(%s,3,raw=TRUE)",predvars)
forms <- map(predpolys, reformulate,
             response=respvar)       ## construct formulas
names(forms) <- predvars             ## names will be inherited by model lists
modList <- map(forms, lm, data= df)  ## fit all models
res <- map(modList, broom::tidy)     ## tidy all models

If desired you can dplyr::bind_rows(res,.id="predvar") at this point, or you can replace map() with map_dfr(..., .id = "predvar") ...

Upvotes: 2

Calum You
Calum You

Reputation: 15072

Without your working environment I am not sure exactly how your data is set up, but it seems like you are trying to fit a model with the same dependent variable on a number of predictor columns. I think the missing piece is a call to rowwise as per the broom and dplyr vignette, but not totally sure. Nevertheless, here is a working example with the mtcars dataset. Note that the structure is to use tidy on a rowwise dataframe with a list-column containing the models, rather than directly on a list. You can also directly create the models by mapping over the dataframe containing predictor columns, rather than the mess of storing models in the environment and needing to use get and ls. Any time you find yourself using ls, think whether you can put your elements in a list!

Edit: After looking over the vignette again prompted by this question, I realised you can actually just do a quick pipe as now shown (see edit history for a method using enframe. By gathering the data into a format that suits grouped model fitting, you can get the desired results much more tidily!

library(tidyverse)
library(broom)

mtcars %>%
  gather(predictor, measure, -mpg) %>%
  group_by(predictor) %>%
  do(tidy(lm(mpg ~ measure, .)))
#> # A tibble: 20 x 6
#> # Groups:   predictor [10]
#>    predictor term        estimate std.error statistic  p.value
#>    <chr>     <chr>          <dbl>     <dbl>     <dbl>    <dbl>
#>  1 am        (Intercept)  17.1      1.12       15.2   1.13e-15
#>  2 am        measure       7.24     1.76        4.11  2.85e- 4
#>  3 carb      (Intercept)  25.9      1.84       14.1   9.22e-15
#>  4 carb      measure      -2.06     0.569      -3.62  1.08e- 3
#>  5 cyl       (Intercept)  37.9      2.07       18.3   8.37e-18
#>  6 cyl       measure      -2.88     0.322      -8.92  6.11e-10
#>  7 disp      (Intercept)  29.6      1.23       24.1   3.58e-21
#>  8 disp      measure      -0.0412   0.00471    -8.75  9.38e-10
#>  9 drat      (Intercept)  -7.52     5.48       -1.37  1.80e- 1
#> 10 drat      measure       7.68     1.51        5.10  1.78e- 5
#> 11 gear      (Intercept)   5.62     4.92        1.14  2.62e- 1
#> 12 gear      measure       3.92     1.31        3.00  5.40e- 3
#> 13 hp        (Intercept)  30.1      1.63       18.4   6.64e-18
#> 14 hp        measure      -0.0682   0.0101     -6.74  1.79e- 7
#> 15 qsec      (Intercept)  -5.11    10.0        -0.510 6.14e- 1
#> 16 qsec      measure       1.41     0.559       2.53  1.71e- 2
#> 17 vs        (Intercept)  16.6      1.08       15.4   8.85e-16
#> 18 vs        measure       7.94     1.63        4.86  3.42e- 5
#> 19 wt        (Intercept)  37.3      1.88       19.9   8.24e-19
#> 20 wt        measure      -5.34     0.559      -9.56  1.29e-10

Created on 2018-07-10 by the reprex package (v0.2.0).

Upvotes: 5

Related Questions