JBJ
JBJ

Reputation: 876

R - data.tree aggregate along ancestors of a leaf?

Background: Suppose I have a decision tree that contains probabilities for the occurence of outcomes at its nodes. I need to compute the joint occurrence probability for each final outcome at each leaf.

Method: I am trying to aggregate along the ancestors of each leaf of a tree, using data.tree in R.

Problem: I am a beginner with data.tree and don't know if that's possible at all.

Here is an example (using sum rather than prod as aggregation, since it's a bit easier to compute by hand):

library(data.tree)
set.seed(123)
# Create a tree
thetree <- CreateRegularTree(height = 3, branchingFactor = 2, parent = Node$new("1"))
thetree$Set(p = 1:thetree$totalCount/10)
print(thetree, "p")
#       levelName   p
# 1 1             0.1
# 2  ¦--1.1       0.2
# 3  ¦   ¦--1.1.1 0.3
# 4  ¦   °--1.1.2 0.4
# 5  °--1.2       0.5
# 6      ¦--1.2.1 0.6
# 7      °--1.2.2 0.7

I tried the Aggregate function

# But this returns aggregations across all children on each level
thetree$Do(function(x) x$result <- Aggregate(x, "p", sum))
print(thetree, "p", "result")

#       levelName   p result
# 1 1             0.1    0.7
# 2  ¦--1.1       0.2    0.7
# 3  ¦   ¦--1.1.1 0.3    0.3
# 4  ¦   °--1.1.2 0.4    0.4
# 5  °--1.2       0.5    1.3
# 6      ¦--1.2.1 0.6    0.6
# 7      °--1.2.2 0.7    0.7

I also tried the argument traversal = "ancestor" without success.

My desired result involves aggregating along each path from the ancestor to each leaf, like -- for leaf 1.1.1. -- 0.3 + 0.2 + 0.1, for example.

# Desired result
#       levelName   p result
# 1 1             0.1    NA
# 2  ¦--1.1       0.2    0.3
# 3  ¦   ¦--1.1.1 0.3    0.6
# 4  ¦   °--1.1.2 0.4    0.7
# 5  °--1.2       0.5    0.6
# 6      ¦--1.2.1 0.6    1.2
# 7      °--1.2.2 0.7    1.3

Upvotes: 3

Views: 310

Answers (1)

Christoph Glur
Christoph Glur

Reputation: 1244

For this, the Do comes in handy:

thetree$result <- thetree$p
traversal <- Traverse(thetree, filterFun = isNotRoot)
Do(traversal, function(node) node$result <- node$parent$result + node$p)

This then gets the desired result:

print(thetree, "p", "result")
      levelName   p result
1 1             0.1    0.1
2  ¦--1.1       0.2    0.3
3  ¦   ¦--1.1.1 0.3    0.6
4  ¦   °--1.1.2 0.4    0.7
5  °--1.2       0.5    0.6
6      ¦--1.2.1 0.6    1.2
7      °--1.2.2 0.7    1.3

Upvotes: 2

Related Questions