Reputation: 1109
We all know that Rich uses a ideal hash tree-based method to implement the persistent data structures in Clojure. This structure enables us to manipulate the persistent data structures without copying a lot.
But it seems I cannot find the correct way to serialize this specific structure. For example given:
(def foo {:a :b :c :d})
(def bar (assoc foo :e :f))
(def bunny {:foo foo :bar bar})
My question is:
How can I serialize the bunny
such that the contents of foo
, i.e. :a
mapping to :b
and :c
mapping to :d
, appear only once in the serialized content? It's like dumping a memory image of the structures. It's also like serializing the "internal nodes" as well as the "leaf nodes" referenced here.
P.S. In case this is relevant, I am building a big DAG (directed acyclic graph) where we assoc
quite a bit to link these nodes to those nodes, and want to serialize the DAG for later de-serialization. The expanded representation of the graph (i.e., the content one'll get when printing the DAG in repl) is unacceptably long.
Upvotes: 4
Views: 407
Reputation: 8088
Davyzhu,
Few things first:
foo
is referenced 1 or more times each will be fully realized (i.e. displayed) in turn during printing.Edit:: Modified since original post to clarify as per comments but the example does not reflect the hierarchical nature of the DAG.
A contrived example:
(def node1 {:a :b :c :d})
(def node2 {:e :f})
(def dictionary {:foo node1 :bar node2})
(def DAG [:bunny [:foo :bar]])
(println DAG) ; => [:bunny [:foo :bar]]
(defn expand-dag1
[x]
(if (keyword? x)
(get dictionary x x)
x))
(println (w/postwalk expand-dag1 DAG)) ; => [:bunny [{:a :b, :c :d} {:e :f}]]
Note: Use of vectors, maps, lists, etc. to express your DAG is up to you.
Upvotes: 1
Reputation: 24750
This is one option (that will not work in Clojurescript, in case that matters) and in general may be perceived as a bad idea, but it is worth mentioning anyway.
If I understand your question, you want the foo
in (def bunny {:foo foo :bar bar})
to not be "pasted" as a full copy, but rather retain a "reference" to the (def foo..)
such that the original foo
map is only serialized once.
One technique I would consider though not necessarily encourage (and only after exhausting other options such as a reorginization of your data structure as hinted by Frank C.), is to serialize the code for bunny
rather than the structure itself. Then you read the code string back in and eval
it. This would only work if the structure for bunny does not change, or if it does, that you can easily build a string of the bunny map with the relevant symbols included as part of the string, rather than the contents of those symbols.
But a much better idea would be to serialize your "raw" data structures only, like the maps foo
and bar
, then build your bunny
after these are read back in -- by also serializing the structure but not the contents of bunny. I believe this is what Frank's answer is getting at.
Worth noting that if the structure of bunny does change dynamically, and you are able to create a string of symbols as suggested in 1. above, then that means you also have the tools to instead build a representation of bunny as in 2. above, which would be preferable.
Since code is data, option 1. is an example of the type of flexibility available to us as lisp programmers -- but that doesn't mean there are not better options.
Upvotes: 1