Reputation: 17
I am trying to create a binary message to send over a socket, but I'm having trouble with the way TCL treats all variables as strings. I need to calculate the length of a string and know its value in binary.
set length [string length $message]
set binaryMessagePart [binary format s* { $length 0 }]
However, when I run this I get the error 'expected integer but got "$length"'. How do I get this to work and return the value for the integer 5 and not the char 5?
Upvotes: 1
Views: 1107
Reputation: 137587
To calculate the length of a string, use string length
. To calculate the length of a string in a particular encoding, convert the string to that encoding and use string length
:
set enc "utf-8"; # Or whatever; you need to know this ahead of time for sanity's sake
set encoded [encoding convertto $enc $message]
set length [string length $encoded]
Note that with the encoded length, this will be in bytes whereas the length prior to encoding is in characters. For some messages and some encodings, the difference can be substantial.
To compose a binary message with the length and the body of the message (a fairly common binary format), use binary format
like this:
# Assumes the length is big-endian; for little-endian, use i instead of I
set binPart [binary format "Ia*" $length $encoded]
What you were doing wrong was using s*
which consumes a list of integers and produces a sequence of little-endian short integer binary values in the output string, and yet were feeding the list that was literally $length 0
; and the string $length
is not an integer as those don't start with $
. We could have instead done [list $length 0]
to produce the argument to s*
and that would have worked, but that doesn't seem quite right for the context of the question.
In binary format
, these are the common formats (there are many more):
a
is for string data (mnemonically “ASCII”); this is binary string data, and you need to encode it first.i
and I
are for 32-bit numbers (mnemonically “int
” like in many programming languages, but especially C). Upper case is big-endian, lower case is little-endian.s
and S
are for 16-bit numbers (mnemonically “short
”).c
is for 8-bit numbers (mnemonically “char
” from C).w
and W
are for 64-bit numbers (mnemonically “wide integers”).f
and d
are for IEEE binary floating point numbers (mnemonically “float
” and “double
” respectively, so 4 and 8 bytes).All can be followed by an optional length, either a number or a *
. For the number ones, instead of inserting a single number they insert a list of them (and so consume a list); numbers give fixed lengths, and *
does “all the list”. For the string format indicator, a number uses a fixed number of bytes in the message (truncating or padding with zero bytes as necessary) and *
does “all the string” (never truncating or padding).
Upvotes: 3