Mislav Sagovac
Mislav Sagovac

Reputation: 195

Apply mlr3 pipes on group by basis

I would like to know is it possible to apply mlr3 Pipe processing on groupBy basis.

For example, from the mlr3pipelines documentation, we can scale predictors with following code:

library(mlr3)
library(mlr3pipelines)
task = tsk("iris")
pop = po("scalemaxabs")
pop$train(list(task))[[1]]$data()

But, is it possible to do scaling by group. For example, lets add month columns to iris data:

library(mlr3)
library(mlr3pipelines)
task = tsk("iris")
dt = task$data()
dt[, month := c(rep(1, 50), rep(2, 50), rep(3, 50))]
task = as_task_classif(dt, target = "Species", id = "iris")

Is it possible to scale predictors by month column? That is, we want to scale every month separately. Using data.table, this is easy:

task$data()[, lapply(.SD, function(x) as.vector(scale(x))), .SDcols = names(DT)[2:5], by = month]

but is it possible to do this inside the mlr3pipe graph?

Upvotes: 0

Views: 47

Answers (1)

be-marc
be-marc

Reputation: 1491

If there is no PipeOp that has exactly this functionality, you can write your own. You already solved the problem with data.tables. mlr3pipelines also uses data.tables internally, so it should be no problem to put your code into a PipeOp. The mlr3book explains how to write your own PipeOp.

Upvotes: 1

Related Questions