Difference between graph and graph learner

Question

I try to understand the difference between a graph and a graph learner. I can $train and $predict with a graph. But I need the "wrapper" in order to use row selection and scores (see code below).

Is there something that can be done with a graph that is not at the same time a learner? (In the code with gr but not with glrn ?



gr = po(lrn("classif.kknn", predict_type = "prob"),
        param_vals = list(k = 10, distance=2, kernel='rectangular' )) %>>%
  po("threshold", param_vals = list(thresholds = 0.6))


glrn = GraphLearner$new(gr)  # build Graph Learner from graph

glrn$train(task, row_ids=1:300)    # n.b.: We need to construct a graph learner in order to use row_ids etc.


predictions=glrn$predict(task,row_ids = 327:346) # would not work with gr

predictions$score(msr("classif.acc"))
predictions$print()

mb706 · Accepted Answer

A GraphLearner always wraps a Graph that takes a single Task as input and produces a single Prediction as output. A Graph can, however, represent any kind of computation and can even take multiple inputs / produce multiple outputs. You would often use these as intermediate building blocks when building a Graph that does training on a single task, giving a single prediction, which is then wrapped as a GraphLearner.

In some cases this could also be helpful if you do some kind of preprocessing such as imputation or PCA that should also be applied to some kind of unseen data (i.e. apply the same rotation as PCA), even though your process as a whole is not classical machine learning producing a model for predictions:

data <- tsk("pima")
trainingset <- sample(seq(0, 1, length.out = data$nrow) < 2/3)
data.t <- data$clone(deep = TRUE)$filter(which(trainingset))
data.p <- data$clone(deep = TRUE)$filter(which(!trainingset))

# Operation:
# 1. impute missing values with mean of non-missings in same column
# 2. rotate to principal component axes
imputepca <- po("imputemean") %>>% po("pca")

# Need to take element 1 of result here: 'Graph' could have multiple
# outputs and therefore returns a list. In our case we only have one
# result that we care about.
rotated.t <- imputepca$train(data.t)[[1]]

rotated.t$head(2)
#>    diabetes       PC1       PC2        PC3      PC4       PC5       PC6       PC7        PC8
#> 1:      pos -4.744963  27.76824 -5.2432401 9.817512 -9.042784 0.4979002 0.4574355 -0.1058608
#> 2:      neg  6.341357 -37.18033 -0.1210501 3.731123 -1.451952 3.6890699 2.3901156  0.0755521

# this data is imputed using the column means of the training data, and then
# rotated by the same rotation as the training data.
rotated.p <- imputepca$predict(data.p)[[1]]

rotated.p$head(2)
#>    diabetes        PC1       PC2        PC3       PC4        PC5       PC6       PC7        PC8
#> 1:      pos -11.535952  9.358736 25.1073705  4.761627 -23.313410 -9.743428  3.412071 -1.6403521
#> 2:      neg   1.189971 -7.098455 -0.2785817 -3.280845  -0.281516 -2.277787 -6.746323  0.3434535

However, since mlr3pipelines is mainly built for mlr3, which is about having Learners that can be trained and resampled etc., you will usually end up wrapping your Graphs in GraphLearners.

Difference between graph and graph learner

Answers (1)

Related Questions