Tobias Piechowiak
Tobias Piechowiak

Reputation: 35

Customize regression tree nodes

I built the following regression tree with the rpart package. I had to rename the variables to alphabet order because the original names were long. Now that I have done the analysis I would like the (relevant) 4 splits to be re-renamed to their original long names. How do I access the split labels? I know this is possible for the rpart.plot package but I would like to stick to the partykit plot layout because I want to have the boxplot in the nodes.

Any solution to that?

enter image description here

Upvotes: 0

Views: 1685

Answers (1)

Achim Zeileis
Achim Zeileis

Reputation: 17183

I would recommend not tweaking this afterwards but keeping the variable names in sync. But for changing the labels used in the plots you only have to change names(party_object$data).

As a simple reproducible example consider the iris data:

library("rpart")
library("partykit")
data("iris", package = "datasets")
names(iris)
## [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"     

Now we change the names in the data to something abbreviated:

names(iris) <- c("SL", "SW", "PL", "PW", "S")

And then grow the rpart() tree and convert it to party:

rp <- rpart(S ~ SL + SW + PL + PW, data = iris)
py <- as.party(rp)
plot(py)

tree1

Then we can simply relabel the variables in $data (note that the order changed, the response is listed first) and plot again:

names(py$data)
## [1] "S"  "SL" "SW" "PL" "PW"
names(py$data) <- c("species", "sepal_length", "sepal_width", "petal_length", "petal_width")
plot(py)

tree2

Most things should work completely fine with this tweaked party object. However, the variable names in the formula and the data are now not in sync. This might lead to problems in some setting. But plotting should be fine.

Upvotes: 2

Related Questions