Reputation: 1295
I'ver never used Tcl's dict
before. While the interface looks easy enough to use, I'm totally surprised by its design and don't have a clue how this can work internally. Could someone please enlighten me? I'm probably missing something fundamental.
I'm familiar with Tcl's internal hash tables at the C level. There all actions refer to a Tcl_HashTable
variable. For dict
I would also have expected an internal variable at the C level which is involved when the dictionary is updated or consulted. I would therefore understand if a dictionary name at the Tcl level would be used to refer to the internal hash variable. However, dict create
returns a list, and dict get
receives a list as value argument. So, instead of a dictionary name, list are passed around as values.
This also raises the question: At which point is the internal hash table built or updated? At https://wiki.tcl-lang.org/page/dict I read "The conversion between internal representations of a dictionary and a list is lossless." This is even more confusing: Why would a list be implemented as a hash table?
(And a side question: Are there any limits on what is allowed as keys and values? Using well-formed lists seems to work, at least.)
Thanks!
Upvotes: 0
Views: 969
Reputation: 137717
Internally, Tcl's dictionaries are a special kind of Tcl_HashTable
that's been enhanced so that the insertion order of keys is preserved. (The hash table entries are additionally kept in a doubly linked list at a cost of storage for two pointers extra per entry.) They also use arbitrary Tcl_Obj
values as keys and values; the values are never interpreted at all (other than to be serialized as strings if the overall dictionary is serialized) and the keys will always have their standard UTF-8 representation computed for hashing purposes, but otherwise can be absolutely any value at all. (Yes, this includes binary data, lists, other dicts, anything.) The only exception is that you can't put a dictionary into itself, either directly or indirectly; the act of trying to do so will make the container change so that it is a modified copy of the containee. (This is a consequence of how Tcl manages its copy-on-(shared)-write value semantics.)
Standard serialization of dicts is done by iterating over the sequence of entries and issuing the quoted elements for each key and value in turn. This is done using the same engine used for serializing lists. Parsing a string to get a dictionary also shares the underlying basic string parser with lists. It is a consequence of this sharing that all lists with even numbers of elements can be interpreted as dictionaries, but doing so might deduplicate elements; if the elements in the indices that correspond to keys are all unique, the conversion from list to dict and back to list is lossless on an element-wise basis (and for lists in canonical form, absolutely lossless). The dict→list→dict round trip is also lossless (including absolutely when starting in canonical form). Because of this high degree of compatibility, there's additionally extra code in the type conversion code to recognise when a dictionary is being made from something that is a list and skip actually building the string; the elements are processed directly. Vice versa, going dictionary to list will also skip the costly string generation. This is only safe because we know that the parsing of basic elements is identical between the two types but that's part of the fundamental design.
The underlying hash table is the main part of the Dict
structure, which is held as the internal representation of a dictionary. There's a multi-layer reference management system in place, just as in the corresponding List
structure, but unless you're poking deep into Tcl's guts, you really don't need to care about that.
dict create
really does return a dictionary.
% set d [dict create a b c d]
a b c d
% tcl::unsupported::representation $d
value is a dict with a refcount of 4, object pointer at 0x7faf35825c00, internal representation 0x7faf3503d910:0x0, string representation "a b c d"
% tcl::unsupported::disassemble script {set d [dict create a b c d]}
ByteCode 0x0x7faf36053c10, refCt 1, epoch 17, interp 0x0x7faf36019010 (epoch 17)
Source "set d [dict create a b c d]"
Cmds 2, src 27, inst 8, litObjs 2, aux 0, stkDepth 3, code/src 0.00
Commands 2:
1: pc 0-6, src 0-26 2: pc 2-5, src 7-25
Command 1: "set d [dict create a b c d]"
(0) push1 0 # "d"
Command 2: "dict create a b c d..."
(2) push1 1 # "a b c d"
(4) dup
(5) verifyDict
(6) storeStk
(7) done
(The dup
/verifyDict
is being used to force dictionary-ness of the literal. The compiler already knows that it is a dictionary, but it's forcing the type because representation-sensitive code — while strongly discouraged — definitely exists out there in the wild.)
Upvotes: 2