Serialize persistent data structures in clojure

Question

We all know that Rich uses a ideal hash tree-based method to implement the persistent data structures in Clojure. This structure enables us to manipulate the persistent data structures without copying a lot.

But it seems I cannot find the correct way to serialize this specific structure. For example given:

(def foo {:a :b :c :d})
(def bar (assoc foo :e :f))
(def bunny {:foo foo :bar bar})

My question is:

How can I serialize the bunny such that the contents of foo, i.e. :a mapping to :b and :c mapping to :d, appear only once in the serialized content? It's like dumping a memory image of the structures. It's also like serializing the "internal nodes" as well as the "leaf nodes" referenced here.

P.S. In case this is relevant, I am building a big DAG (directed acyclic graph) where we assoc quite a bit to link these nodes to those nodes, and want to serialize the DAG for later de-serialization. The expanded representation of the graph (i.e., the content one'll get when printing the DAG in repl) is unacceptably long.

Frank C. · Accepted Answer

Davyzhu,

Few things first:

The DAG, without tokenization strategy, will be as long as the DAG is. If foo is referenced 1 or more times each will be fully realized (i.e. displayed) in turn during printing.
For the interchanges of the information (serialize and deserialize) it will be largely dependent on your goals. For example, if you are serializing to send it off over the wire you will either want to do it fully (like the printed representation) or you will need to encode individual data points with some identification/tokenization strategy. The latter, of course, assumes the receiving end can deserialize with understanding of the tokenization protocol.
The tokenization strategy example, could use Clojure meta facilities perhaps, would require encoding unique keys for each content block reference and your DAG contains nodes where the edges are represented by the keys.

Edit:: Modified since original post to clarify as per comments but the example does not reflect the hierarchical nature of the DAG.

A contrived example:

(def node1 {:a :b :c :d})
(def node2 {:e :f})
(def dictionary {:foo node1 :bar node2})

(def DAG [:bunny [:foo :bar]])

(println DAG) ; => [:bunny [:foo :bar]]

(defn expand-dag1
  [x]
  (if (keyword? x)
    (get dictionary x x)
    x))

(println (w/postwalk expand-dag1 DAG)) ; => [:bunny [{:a :b, :c :d} {:e :f}]]

Note: Use of vectors, maps, lists, etc. to express your DAG is up to you.

Serialize persistent data structures in clojure

Answers (2)

Related Questions