ohnoplus
ohnoplus

Reputation: 1355

Mapping a model to a permuted data set sometimes returns the model equation, rather than model output

I am trying to use the permute function in modelr with purrr map to calculate the mean values of two categories of data under permutation.

The function behaves as one would expect if I am trying to calculate linear models off of the permuted data sets, as per the example file for modelr::permute (though I am running the linear model inside of a custom function):

library(tidyverse) 
library(modelr)

perms <- permute(mtcars,  1000, mpg)
jlm <- function(df){lm(mpg ~ wt, data = df)}
models3 <- map(perms$perm, jlm)
models3[[1]]
Call:
lm(formula = mpg ~ wt, data = df)

Coefficients:
(Intercept)           wt  
     28.211       -2.524

Now, instead of a linear model, I just want mean values for two categories in that data set. I tried running as follows.

mean_of_vs <- function(df){
  df %>% group_by(vs) %>% summarize(mean(mpg)) %>% spread(vs, `mean(mpg)`) %>%
    rename(zero = `0`, one = `1`)
}

models4 <- map(perms$perm, ~mean_of_vs)

models4[[1]]

but this just returns the function equation, rather than the output of the function

function(df){
  df %>% group_by(vs) %>% summarize(mean(mpg)) %>% spread(vs, `mean(mpg)`) %>%
    rename(zero = `0`, one = `1`)
}

The equation works by itself on a simple data frame.

test <- perms %>% pull(perm) %>% .[[1]] %>% as.data.frame

mean_of_vs(test)
# A tibble: 1 x 2
   zero   one
  <dbl> <dbl>
1  16.6  24.5

So my question is, why doesn't my custom function return a bunch of one line data frames with the mean value of vs = 0 and vs = 1 and how would I get it to do this?

Thanks.

Upvotes: 0

Views: 123

Answers (2)

Steve Lee
Steve Lee

Reputation: 776

I am glad to meet you.

modelr::permute produces the data which its class is 'permutation'

> class(perms[[1]][1][[1]])

[1] "permutation"

permutation class has 3 attributes

data

The data in this variable

columns

columns you permute

idx

indexes indicating which rows have been selected

i think permutation only takes some kinds of formula (like lm and etc.. i am not sure about the formula list).

So if you want use function you want you have to transform to data.frame/data.table/tibble like below

mean_of_vs <- function(df){
   df %>%as.data.frame() %>% group_by(vs) %>% summarize(mean(mpg)) %>% spread(vs, `mean(mpg)`) %>%
     rename(zero = `0`, one = `1`)
}

Then, execute map function with out ~ notation.

models4 <- map(perms$perm, mean_of_vs)

Then you will get the result


.....

[[97]]

# A tibble: 1 x 2
   zero   one

  <dbl> <dbl>
1  21.4  18.4




[[98]]

# A tibble: 1 x 2
   zero   one

  <dbl> <dbl>
1  20.4  19.7
.....

Upvotes: 1

John Colby
John Colby

Reputation: 22588

Permute returns type <S3: permutation>, not a data frame.

> perms
# A tibble: 1,000 x 2
   perm              .id
   <list>            <chr>
 1 <S3: permutation> 0001
 2 <S3: permutation> 0002
 3 <S3: permutation> 0003
 4 <S3: permutation> 0004
 5 <S3: permutation> 0005
 6 <S3: permutation> 0006
 7 <S3: permutation> 0007
 8 <S3: permutation> 0008
 9 <S3: permutation> 0009
10 <S3: permutation> 0010
# ... with 990 more rows

Examining it reveals the data frame is stored as the first element in the named list:

> glimpse(perms[[1,1]])
List of 3
 $ data   :'data.frame':    32 obs. of  11 variables:
  ..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
  ..$ cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
  ..$ disp: num [1:32] 160 160 108 258 360 ...
  ..$ hp  : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
  ..$ drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
  ..$ wt  : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
  ..$ qsec: num [1:32] 16.5 17 18.6 19.4 17 ...
  ..$ vs  : num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
  ..$ am  : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
  ..$ gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
  ..$ carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
 $ columns: Named chr "mpg"
  ..- attr(*, "names")= chr "mpg"
 $ idx    : int [1:32] 1 30 21 12 27 14 17 2 15 32 ...
 - attr(*, "class")= chr "permutation"

So to do what you want, just access the data element in the first step of your mean_of_vs() function:

mean_of_vs <- function(df) {
  df$data %>% 
    group_by(vs) %>% 
    summarize(mean(mpg)) %>% 
    spread(vs, `mean(mpg)`) %>%
    rename(zero = `0`, one = `1`)
}

Now things work as expected:

> models4 <- map(perms$perm, mean_of_vs)
> models4[[1]]
# A tibble: 1 x 2
   zero   one
  <dbl> <dbl>
1  16.6  24.6

Upvotes: 2

Related Questions