user2850249
user2850249

Reputation: 169

How to maintain a property about trees using a Scala Collections TreeMap (or similar)

Since the scala RedBlack tree is no longer available in the scala collections, I'm having trouble figuring out how to maintain properties about tree in the nodes on a scala tree, as I need direct access to the tree rather than the encaspulated interface provided by the standard collections.

For example, I want a tree that maintains on each node a field x such that the function f holds for each node n such that

f(null) = 0
f(n) = n.x + f(n.left) + f(n.right)

Can I efficiently use a Scala standard library TreeMap for this or will I have to implement my own tree (currently doing this)?

Upvotes: 1

Views: 78

Answers (1)

Rüdiger Klaehn
Rüdiger Klaehn

Reputation: 12565

Unfortunately, I don't think you will be able to do this using TreeMap. You might want to just copy the red black tree from scala collections and add a summary field.

Alternatively, you might want to use one of the tree-based collections that allow specifying a "measure". E.g. https://github.com/Sciss/FingerTree

Or, you could use this horrible hack to add some summary data to the nodes of e.g. a TreeSet. Note the namespace scala.collection.immutable so you can access RedBlackTree from inside the utility.

package scala.collection.immutable

trait Summary[E, S] {
  def apply(v: E): S
  def empty: S
  def combine(a: S, b: S): S
  def combine3(a: S, b: S, c: S) = combine(combine(a, b), c)
}

object TreeSetSummarizer {

  private[this] val treeSetAccessor = classOf[scala.collection.immutable.TreeSet[_]].getDeclaredField("tree")
  treeSetAccessor.setAccessible(true)

  private def tree[K](set: TreeSet[K]): AnyRef = treeSetAccessor.get(set) match {
    case t: RedBlackTree.Tree[K, Unit] ⇒ t
    case _ ⇒ "null"
  }

  private type JFunction[T, R] = java.util.function.Function[T, R]

  def apply[K, S](implicit s: Summary[K, S]): (TreeSet[K] ⇒ S) = new TreeSetSummarizer[K, S]
}

class TreeSetSummarizer[K, S](implicit summary: Summary[K, S]) extends (TreeSet[K] ⇒ S) {
  import TreeSetSummarizer._

  // this should be a guava cache using weak keys to prevent memory leaks
  private val memo = new java.util.IdentityHashMap[AnyRef, S]()

  private val f: JFunction[AnyRef, S] = new JFunction[AnyRef, S] {
    def apply(t: AnyRef): S = t match {
      case t: RedBlackTree.Tree[K, Unit] ⇒ summary.combine3(apply0(t.left), summary.apply(t.key), apply0(t.right))
      case _ ⇒ summary.empty
    }
  }

  private def apply0(set: AnyRef): S =
    memo.computeIfAbsent(set, f)

  def apply(set: TreeSet[K]): S =
    apply0(tree(set))
}

Here is how you would use it

import scala.collection.immutable._

// create a large TreeSet
val largeSet = TreeSet(0 until 10000: _*)
// define your summary
implicit val s = new Summary[Int, Long] {
  def empty = 0L
  def apply(x: Int) = x
  def combine(a: Long, b: Long) = a + b
}
// define your summarizer. You need to keep the summarizer instance for
// having a performance benefit, since it internally stores summaries for
// tree nodes in an identity hash map
val summary = TreeSetSummarizer.apply[Int, Long]

// summarize something
println(summary(largeSet))

// summarize a modified set. This is fast because the summaries for tree
// nodes are being cached.
val largeSet1 = largeSet - 5000
println(summary(largeSet1))

Note that given the reflection and hashing overhead, this is probably best used for more computationally intensive summaries than a simple sum.

Update: I have now written a small library persistentsummary to define persistent summaries for existing scala collections. This should do exactly what you want.

Upvotes: 1

Related Questions