Reputation: 195
I would like to know is it possible to apply mlr3
Pipe processing on groupBy basis.
For example, from the mlr3pipelines
documentation, we can scale predictors with following code:
library(mlr3)
library(mlr3pipelines)
task = tsk("iris")
pop = po("scalemaxabs")
pop$train(list(task))[[1]]$data()
But, is it possible to do scaling by group. For example, lets add month columns to iris data:
library(mlr3)
library(mlr3pipelines)
task = tsk("iris")
dt = task$data()
dt[, month := c(rep(1, 50), rep(2, 50), rep(3, 50))]
task = as_task_classif(dt, target = "Species", id = "iris")
Is it possible to scale predictors by month column? That is, we want to scale every month separately. Using data.table, this is easy:
task$data()[, lapply(.SD, function(x) as.vector(scale(x))), .SDcols = names(DT)[2:5], by = month]
but is it possible to do this inside the mlr3pipe graph?
Upvotes: 0
Views: 47
Reputation: 1491
If there is no PipeOp
that has exactly this functionality, you can write your own. You already solved the problem with data.tables
. mlr3pipelines also uses data.tables internally, so it should be no problem to put your code into a PipeOp
. The mlr3book explains how to write your own PipeOp
.
Upvotes: 1