user3365107
user3365107

Reputation: 238

TCL and splitting parts of string into integers

I have a string and I want to convert parts of it to different integer variables.for example I have the string : "some text,0x0110, 0xa0, 0xff, 0x02"

from this point I want that var1=0x02,var2=0xff,var3=0xa0,var4=0x02

anyone have any experience with tcl and strings and can help me?

Upvotes: 1

Views: 1072

Answers (2)

Peter Lewerin
Peter Lewerin

Reputation: 13252

Assuming

set str "some text,0x0110, 0xa0, 0xff, 0x02"

If you only want the values, you can use this command, which returns a list of values:

scan $str "some text,%x,%x,%x,%x"
# -> 272 160 255 2

(It asks the scan command to find and extract four fields of hexadecimal values separated by commas (with optional whitespace) and preceded by a prefix string.)

If you want to assign these values to variables directly, invoke the command like this (in which case it returns the number of fields read, which is a good thing to know, since any number not equal to the number of fields you put in indicates that something has gone wrong):

scan $str "some text,%x,%x,%x,%x" var1 var2 var3 var4
# -> 4

If you want to store the values as hexadecimal literals, this command might do:

scan $str {some text, %[^,], %[^,], %[^,], %[^,]} var1 var2 var3 var4

(It specifies that the fields should consist of any character except a comma, otherwise it's the same as before. In this case, the whitespace after the comma needs to be specified before the format string: the space character indicates that zero or more space, tab, or newline characters should be skipped. The braces are necessary to prevent Tcl from interpreting the square brackets as embedded commands.)

Another variant:

scan $str {some text, %[xX0-9a-fA-F], %[xX0-9a-fA-F], %[xX0-9a-fA-F], %[xX0-9a-fA-F]} var1 var2 var3 var4

(This one specifies each field as a string consisting of exactly the character x (upper or lower case) and the hexadecimal digits, in some order.)

This is a bit unwieldy. You can make it a little less complex by building it from pieces:

set X {%[xX0-9a-fA-F]}
# -> %[xX0-9a-fA-F]
set fmt [join [concat {{some text}} [lrepeat 4 $X]] {, }]
# -> some text, %[xX0-9a-fA-F], %[xX0-9a-fA-F], %[xX0-9a-fA-F], %[xX0-9a-fA-F]
scan $str $fmt var1 var2 var3 var4

There are more ways to do it. You can't split it directly into a list, since split $str {, } will split on either comma or space, not the string comma+space (well, you can, but it isn't really convenient). But: if you first convert all comma+space strings to just commas, split becomes useful:

string map {{, } ,} $str
# -> some text,0x0110,0xa0,0xff,0x02
split [string map {{, } ,} $str] ,
# -> {some text} 0x0110 0xa0 0xff 0x02
lrange [split [string map {{, } ,} $str] ,] 1 end
# -> 0x0110 0xa0 0xff 0x02

leading to:

lassign [lrange [split [string map {{, } ,} $str] ,] 1 end] var1 var2 var3 var4

Which gives you the assignments you want.

I was pondering whether to explain regular expression-based extraction as well, but now I see that glenn jackman* has already done that. Just for completeness I'll add a brief mention of it to my answer too, but I basically have very little to say beyond what he did:

regexp -inline -all -- {0[xX][[:xdigit:]]+} $str
# -> 0x0110 0xa0 0xff 0x02
lassign [regexp -inline -all -- {0[xX][[:xdigit:]]+} $str] var1 var2 var3 var4

There are some differences between my definition and glenn's. He was using word anchors (\m and \M), which doesn't seem quite necessary here (but might prove useful in some exotic cases: it's certainly not wrong to use them). He also matches against a literal x in the prefix of the hex number: I prefer to match against either an upper or lower case x ([xX]). In practice, hexadecimal literals are almost always written as 0x... but you can never be quite sure. So the differences boil down to him wanting to be extra sure in one way, and me wanting to be extra sure in another.

The invocation of regexp says to return a list of the matches (-inline), to match all occurrences of the regular expression (-all) and to match against a string consisting of a zero character (0), followed by an upper or lower case x character ([xX]), followed by one or more (+) occurrences of ([...]) a hex digit ([:xdigit:]). Again, the braces around the expression prevent Tcl from trying to evaluate the text inside square brackets as commands.

Commands used (links to manual pages): set, scan, join, concat, lrepeat, string, split, lrange, lassign, regexp.

*) Mr Jackman does sign his name that way. It feels strange and vaguely disrespectful to write someone else's name in all lower case, but OTOH it also feels wrong to change the way someone themselves have chosen to write their name.

Upvotes: 3

glenn jackman
glenn jackman

Reputation: 246942

Extracting the values with a regex

% set s "some text,0x0110, 0xa0, 0xff, 0x02"
some text,0x0110, 0xa0, 0xff, 0x02
% set xnums [regexp -inline -all {\m0x[[:xdigit:]]+\M} $s]
0x0110 0xa0 0xff 0x02
% lassign $xnums var1 var2 var3 var4

Upvotes: 2

Related Questions