Reputation: 807
I would like to compare model performance for a bunch of models using the same predictors but different model parameters. This seems like the place to use broom
to create a tidy output, but I can't figure it out.
Here's some non-working code that helps suggest what I'm thinking about:
seq(1:10) %>%
do(fit = knn(train_Market, test_Market, train_Direction, k=.), score = mean(fit==test_Direction)) %>%
tidy()
For more context, this is part of one of the ISLR labs that we are trying to tidyverse-ify. You can see the entire lab here: https://github.com/AmeliaMN/tidy-islr/blob/master/lab3/lab3.Rmd
[Update: reproducible example] It's hard to make a minimal example here because of the need for data wrangling before model fitting, but this should be reproducible:
library(ISLR)
library(dplyr)
train = Smarket %>%
filter(Year < 2005)
test = Smarket %>%
filter(Year >= 2005)
train_Market = train %>%
select(Lag1, Lag2)
test_Market = test %>%
select(Lag1, Lag2)
train_Direction = train %>%
select(Direction) %>%
.$Direction
set.seed(1)
knn_pred = knn(train_Market, test_Market, train_Direction, k=1)
mean(knn_pred==test_Direction)
knn_pred = knn(train_Market, test_Market, train_Direction, k=3)
mean(knn_pred==test_Direction)
knn_pred = knn(train_Market, test_Market, train_Direction, k=4)
mean(knn_pred==test_Direction)
etc.
Upvotes: 4
Views: 317
Reputation: 78610
Since your output of each knn (and oracle) is a vector, this is a good case for tidyr's unnest
(in combination with purrr's map
and rep_along
:
library(class)
library(purrr)
library(tidyr)
set.seed(1)
predictions <- data_frame(k = 1:5) %>%
unnest(prediction = map(k, ~ knn(train_Market, test_Market, train_Direction, k = .))) %>%
mutate(oracle = rep_along(prediction, test_Direction))
The predictions
variable is then organized as:
# A tibble: 1,260 x 3
k prediction oracle
<int> <fctr> <fctr>
1 1 Up Up
2 1 Down Up
3 1 Up Down
4 1 Up Up
5 1 Up Up
6 1 Down Up
7 1 Down Down
8 1 Down Up
9 1 Down Up
10 1 Up Up
# ... with 1,250 more rows
Which can easily be summarized:
predictions %>%
group_by(k) %>%
summarize(accuracy = mean(prediction == oracle))
Again, you don't need broom since each output is a factor, but if it were a model, you could use broom's tidy
or augment
and then unnest it in a similar fashion.
One important aspect of this approach is that it's flexible to many combinations of parameters, by combining them with tidyr's crossing
(or expand.grid
) and using invoke_rows
to apply the function to each row. For example, you could try variations of l
alongside k
:
crossing(k = 2:5, l = 0:1) %>%
invoke_rows(knn, ., train = train_Market, test = test_Market, cl = train_Direction) %>%
unnest(prediction = .out) %>%
mutate(oracle = rep_along(prediction, test_Direction)) %>%
group_by(k, l) %>%
summarize(accuracy = mean(prediction == oracle))
This returns:
Source: local data frame [8 x 3]
Groups: k [?]
k l accuracy
<int> <int> <dbl>
1 2 0 0.5396825
2 2 1 0.5277778
3 3 0 0.5317460
4 3 1 0.5317460
5 4 0 0.5277778
6 4 1 0.5357143
7 5 0 0.4841270
8 5 1 0.4841270
Upvotes: 3