User defined impurity in Regression Decision Trees

Question

I am migrating from R to PySpark. I have a process that creates a regression tree that is currently built using R's rpart algorithm.

While configuring this in PySpark, I am unable to see an option to specify a custom custom impurity function. I have a skewed dataset, and instead of using mean and variance/ standard deviation in the formula as criterion for impurity of a node, I want to use a metric more suited for my skewed data. How can I define a custom impurity function in PySpark?

I've looked at the documentation for Decision Tree Regression and documentation for the impurity parameter only mentions support for variance

impurity = Param(parent='undefined', name='impurity', doc='Criterion used for information gain calculation (case-insensitive). Supported options: variance')

Is there any workaround to define a custom impurity function?

User defined impurity in Regression Decision Trees

Answers (1)

Related Questions