Reputation: 107
Can anyone explain me the differences between scan
and binary scan
.
format
and binary format
.
I am getting confusion with the binary commands .
Upvotes: 0
Views: 3878
Reputation: 137587
The format
command assembles strings of characters, the binary format
command assembles strings of bytes. The scan
and binary scan
commands do the reverse, extracting formation from character strings and byte strings respectively.
Note that Tcl happens to map byte strings neatly onto character strings where the characters are in the range \u0000
–\u00FF
, and there are other operations for getting information into and out of binary strings that are sometimes relevant. Most notably, encoding convertto
and encoding convertfrom
: encoding convertto
formats a string as a sequence of bytes that represent that string in a given encoding (an operation which can lose information) and encoding converfrom
goes in the opposite direction.
So what encoding are Tcl's strings really in? Well, none really. Or many. The logical level works with character sequences exclusively, and the implementation will actually move things back and forth (mostly between a variant of UTF-8 and UCS-2, though with optimisations for handling byte strings via arrays of unsigned char
) as necessary. While this is not always perfectly efficient, most code never notices what's going on due to the type-caching used.
If you have Tcl 8.6, you can peek behind the covers to observe the types with an unsupported command:
# Output is human-readable; experiment to see what it says for you
puts [tcl::unsupported::representation $MyString]
Don't use this to base functional decisions on; Tcl is very happy to mutate types out from under your feet. But it can help when finding out why your code is unexpectedly slow. (Note also that types attach to values, and not to variables.)
Upvotes: 0
Reputation: 55473
To understand the difference between command sets manipulating binary and string data you have to understand the distinction between these two kinds of data.
In Tcl, as in many (most?) high-level languages, strings are rather abstract — that is, they are described in pretty high-level terms. Particularly in Tcl, strings are defined to have the following properties:
Note that many things are left out from this definition:
NUL
-terminated arrays? linked lists of unsigned long
s? something else?).(To put it into a more interesting perspective, Tcl is able to transparently change the underlying representations of strings it manages — between UTF-8
and UTF-16
encoded sequences. But here we're talking about the reference Tcl implementation, and other implementations (such as Jacl for instance) are free to do something else completely.)
The same approach is used to manipulate all the other kinds of data in the Tcl interpreter. Say, integer numbers are stored using native platform "integers" (roughly "as in C") but they are transparently upgraded into arbitrary sized integers if an arithmetic operation is about to overflow the platform-sized result.
So long as you don't leave the comfortable world of the Tcl interpreter, this is all you should know about the data types it manages. But now there's the outside world. In it, abstract concepts which are Tcl strings do not exist. Say, if you need to communicate to some other program over a network socket or by means of using a file or whatever other kind of media, you have to get down to the level of exact layouts of raw bytes which are described by "wire protocols" and file formats or whatever applies to your case. This is where "binaries" come into play: they allow you to precisely specify how the data is laid out so that it's ready to be transferred to the outside world or be consumed from it — binary format
makes these "binaries" and binary scan
reads them.
Note that certain Tcl commands for working with the outside world are "smart by default" — for instance, the open
command which opens files by default assumes they are textual and are encoded in the default system encoding (which is deduced, broadly speaking, from the environment). You can then use the chan configure
(of fconfigure
— in older versions of Tcl) command to either change this encoding or completely inhibit conversions by specifying the channel is in "binary mode". The same applies to EOL conversions.
Note also that there are specialized packages for Tcl that effectively hide the complexities of working with a particular wire/file format. To present one example, the tdom package works with XML
; when you manipulate XML
using this package, you're not concerned with how exactly XML
must be represented when, say, saved to a file — you just work with tdom's objects, native Tcl strings etc.
Upvotes: 2
Reputation: 3908
The docs are pretty good and contain examples:
Maybe you could ask a more specific question?
Upvotes: 0