davidchambers
davidchambers

Reputation: 24866

What is the data structure behind Clojure's sets?

I recently listened to Rich Hickey's interview on Software Engineering Radio. During the interview Rich mentioned that Clojure's collections are implemented as trees. I'm hoping to implement persistent data structures in another language, and would like to understand how sets and Clojure's other persistent data structures are implemented.

What would the tree look like at each point in the following scenario?

  1. Create the set {1 2 3}

  2. Create the union of {1 2 3} and {4}

  3. Create the difference of {1 2 3 4} and {1}

I'd like to understand how the three sets generated ({1 2 3}, {1 2 3 4}, and {2 3 4}) share structure, and how "deletions" are handled.

I'd also like to know the maximum number of branches that a node may have. Rich mentioned in the interview that the trees are shallow, so presumably the branching factor is greater than two.

Upvotes: 19

Views: 5462

Answers (4)

amalloy
amalloy

Reputation: 92147

I like SCdF's drawings and explanations, but if you're looking for more depth you should read the excellent series of articles on Clojure's data structures at Higher-Order. It explains in detail how Clojure's maps work, and Clojure's sets are just a thin layer on top of its maps: #{:a :b} is implemented as a wrapping around {:a :a, :b :b}.

Upvotes: 8

Rodrigo Taboada
Rodrigo Taboada

Reputation: 2727

You probably need to read the work of Phil Bagwell. His research into data structures is the base of Clojure, Haskell and Scala persistent data structures.

There is this talk by Phil at Clojure/Conj: http://www.youtube.com/watch?v=K2NYwP90bNs

There are also some papers:

You can also read Purely Functional Data Structures by Chris Okasaki. This blog post talks about the book: http://okasaki.blogspot.com.br/2008/02/ten-years-of-purely-functional-data.html

Upvotes: 24

SCdF
SCdF

Reputation: 59428

You should really read Clojure Programming, it covers this in great detail, including pictures. Briefly though, collections are depth first searches through trees. We can show your examples like this:

(def x #{1 2 3})

x
|
| \
|\ 3
1 \
   2

(def y (conj x 4))

 x  y
 | / \
 | \   4
 |\ 3
 1 \
    2

 (def z (difference y #{1}))

 x  y
 | / \
 | \  4
 |\ 3
 1/\
z-  2

Note that these are just indicative, I'm not saying that this is exactly the layout Clojure uses internally. It's the gist though.

Upvotes: 11

gtrak
gtrak

Reputation: 5676

Here's a starting point: https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/PersistentHashSet.java

You can see it's implemented in terms of PersistentHashMap.

Upvotes: 0

Related Questions