Reputation: 139
I try to understand the difference between a graph and a graph learner. I can $train and $predict with a graph. But I need the "wrapper" in order to use row selection and scores (see code below).
Is there something that can be done with a graph that is not at the same time a learner? (In the code with gr
but not with glrn
?
gr = po(lrn("classif.kknn", predict_type = "prob"),
param_vals = list(k = 10, distance=2, kernel='rectangular' )) %>>%
po("threshold", param_vals = list(thresholds = 0.6))
glrn = GraphLearner$new(gr) # build Graph Learner from graph
glrn$train(task, row_ids=1:300) # n.b.: We need to construct a graph learner in order to use row_ids etc.
predictions=glrn$predict(task,row_ids = 327:346) # would not work with gr
predictions$score(msr("classif.acc"))
predictions$print()
Upvotes: 1
Views: 73
Reputation: 672
A GraphLearner
always wraps a Graph
that takes a single Task
as input and produces a single Prediction
as output. A Graph
can, however, represent any kind of computation and can even take multiple inputs / produce multiple outputs. You would often use these as intermediate building blocks when building a Graph
that does training on a single task, giving a single prediction, which is then wrapped as a GraphLearner
.
In some cases this could also be helpful if you do some kind of preprocessing such as imputation or PCA that should also be applied to some kind of unseen data (i.e. apply the same rotation as PCA), even though your process as a whole is not classical machine learning producing a model for predictions:
data <- tsk("pima")
trainingset <- sample(seq(0, 1, length.out = data$nrow) < 2/3)
data.t <- data$clone(deep = TRUE)$filter(which(trainingset))
data.p <- data$clone(deep = TRUE)$filter(which(!trainingset))
# Operation:
# 1. impute missing values with mean of non-missings in same column
# 2. rotate to principal component axes
imputepca <- po("imputemean") %>>% po("pca")
# Need to take element 1 of result here: 'Graph' could have multiple
# outputs and therefore returns a list. In our case we only have one
# result that we care about.
rotated.t <- imputepca$train(data.t)[[1]]
rotated.t$head(2)
#> diabetes PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
#> 1: pos -4.744963 27.76824 -5.2432401 9.817512 -9.042784 0.4979002 0.4574355 -0.1058608
#> 2: neg 6.341357 -37.18033 -0.1210501 3.731123 -1.451952 3.6890699 2.3901156 0.0755521
# this data is imputed using the column means of the training data, and then
# rotated by the same rotation as the training data.
rotated.p <- imputepca$predict(data.p)[[1]]
rotated.p$head(2)
#> diabetes PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
#> 1: pos -11.535952 9.358736 25.1073705 4.761627 -23.313410 -9.743428 3.412071 -1.6403521
#> 2: neg 1.189971 -7.098455 -0.2785817 -3.280845 -0.281516 -2.277787 -6.746323 0.3434535
However, since mlr3pipelines
is mainly built for mlr3
, which is about having Learner
s that can be trained and resampled etc., you will usually end up wrapping your Graph
s in GraphLearner
s.
Upvotes: 4