my_question
my_question

Reputation: 3235

A follow-on to Donal about expr command performance

I just read the great reply from Donal Fellows on this question. It is so informative.

But I have a question about this part: never use a dynamic expression with if, for or while as that will suppress a lot of compilation.

I read it twice, but do not think I totally get it.

Donal or others, could you elaborate it a little bit more?

[UPDATE1] Out of curiosity, I tried inside tkcon the example Donal gave in his reply:

% set a {1||0}
1||0
% set b {[exit 1]}
[exit 1]
% expr {$a + $b}
can't use non-numeric string as operand of "+"
% expr $a + $b
1

Interesting, why "expr $a + $b" ends up "1"? Isn't "expr $a + $b" expanded into "expr 1||0 + [exit 1]"? If I just run the expanded version, tkcon just closes which makes senses to me because [exit 1] runs.

[UPDATE2] I am still pondering about my question in UPDATE1. As suggested, I did one more experiment:

% concat $a + $b
1||0 + [exit 1]
% expr 1||0 + [exit 1]
...tkcon closes...

tkcon closing is what I expected, still wondering why expr $a + $b yields 1.

Upvotes: 1

Views: 82

Answers (1)

Donal Fellows
Donal Fellows

Reputation: 137627

You're strongly discouraged from composing expressions that way, as it is easy to get it wrong and create a (potential) security hole.

set x 1
set y {[exec echo >@stdout rm -rf /]};   # Assume this string has come from the user
expr $x+$y
# After Tcl language substitution, it's equivalent to this:
#     expr {1+[exec echo >@stdout rm -rf /]}
# If you're not sure why that might be a problem, think a little more...

It also is not strongly compiled, since Tcl's bytecode compiler does not (in general) do a lot of constant folding, and instead you get an invoke of the opcode that takes a string and compiles that string at runtime into bytecode for execution. It's not efficient as well as unsafe.

However, there's more. If we look at this instead:

if $x==$y {
    # ...
}

The body of that if is not compiled because the if compiler code is just sees the substitutions and bails out, pushing things back to (effectively) interpreted mode execution. That will slow down the whole arm of the if. If you are doing composed expressions (which I discourage for safety reasons) then at least do this:

if {[expr $x==$y]} {
    # ...
}

That at least keeps the if in its efficient mode. (It's otherwise semantically equivalent.)


Bytecode for the above

First, for expr.

% tcl::unsupported::disassemble script {expr $x+$y}
ByteCode 0x0x1008b2210, refCt 1, epoch 96, interp 0x0x100829a10 (epoch 96)
  Source "expr $x+$y"
  Cmds 1, src 10, inst 12, litObjs 3, aux 0, stkDepth 3, code/src 0.00
  Commands 1:
      1: pc 0-10, src 0-9
  Command 1: "expr $x+$y"
    (0) push1 0     # "x"
    (2) loadStk 
    (3) push1 1     # "+"
    (5) push1 2     # "y"
    (7) loadStk 
    (8) strcat 3 
    (10) exprStk 
    (11) done 
% tcl::unsupported::disassemble script {expr {$x+$y}}
ByteCode 0x0x1008eb610, refCt 1, epoch 96, interp 0x0x100829a10 (epoch 96)
  Source "expr {$x+$y}"
  Cmds 1, src 12, inst 8, litObjs 2, aux 0, stkDepth 2, code/src 0.00
  Commands 1:
      1: pc 0-6, src 0-11
  Command 1: "expr {$x+$y}"
    (0) push1 0     # "x"
    (2) loadStk 
    (3) push1 1     # "y"
    (5) loadStk 
    (6) add 
    (7) done 

Note that in the first version we use exprStk (a general string operation) whereas the second version uses add (which knows it is working with numbers and throws errors otherwise).

Then, for if.

% tcl::unsupported::disassemble script {if $x==$y {
    incr hiya
}}
ByteCode 0x0x10095e210, refCt 1, epoch 96, interp 0x0x100829a10 (epoch 96)
  Source "if $x==$y {\n        incr hiya\n   "...
  Cmds 1, src 35, inst 17, litObjs 5, aux 0, stkDepth 4, code/src 0.00
  Commands 1:
      1: pc 0-15, src 0-34
  Command 1: "if $x==$y {\n        incr hiya\n   "...
    (0) push1 0     # "if"
    (2) push1 1     # "x"
    (4) loadStk 
    (5) push1 2     # "=="
    (7) push1 3     # "y"
    (9) loadStk 
    (10) strcat 3 
    (12) push1 4    # "\n        incr hiya\n  "...
    (14) invokeStk1 3 
    (16) done 
% tcl::unsupported::disassemble script {if {[expr $x==$y]} {
    incr hiya
}}
ByteCode 0x0x10095cc10, refCt 1, epoch 96, interp 0x0x100829a10 (epoch 96)
  Source "if {[expr $x==$y]} {\n        incr hiya\n   "...
  Cmds 3, src 44, inst 32, litObjs 5, aux 0, stkDepth 3, code/src 0.00
  Commands 3:
      1: pc 0-30, src 0-43        2: pc 0-10, src 5-15
      3: pc 14-26, src 29-37
  Command 1: "if {[expr $x==$y]} {\n        incr hiya\n   "...
  Command 2: "expr $x==$y"...
    (0) push1 0     # "x"
    (2) loadStk 
    (3) push1 1     # "=="
    (5) push1 2     # "y"
    (7) loadStk 
    (8) strcat 3 
    (10) exprStk 
    (11) nop 
    (12) jumpFalse1 +17     # pc 29
  Command 3: "incr hiya"...
    (14) startCommand +13 1     # next cmd at pc 27, 1 cmds start here
    (23) push1 3    # "hiya"
    (25) incrStkImm +1 
    (27) jump1 +4   # pc 31
    (29) push1 4    # ""
    (31) done 

Notice how the second version has understood that it is doing an increment (incrStkImm)? That helps a lot with performance, especially for longer, less-trivial scripts. The first version just assembles a list of arguments and uses invokeStk1 to call the interpreted if implementation.

FWIW, the “gold standard” (assuming we're not in a procedure) is this:

% tcl::unsupported::disassemble script {if {$x==$y} {
    incr hiya
}}
ByteCode 0x0x1008efb10, refCt 1, epoch 96, interp 0x0x100829a10 (epoch 96)
  Source "if {$x==$y} {\n    incr hiya\n"...
  Cmds 2, src 29, inst 18, litObjs 4, aux 0, stkDepth 2, code/src 0.00
  Commands 2:
      1: pc 0-16, src 0-28        2: pc 9-12, src 18-26
  Command 1: "if {$x==$y} {\n    incr hiya\n"...
    (0) push1 0     # "x"
    (2) loadStk 
    (3) push1 1     # "y"
    (5) loadStk 
    (6) eq 
    (7) jumpFalse1 +8   # pc 15
  Command 2: "incr hiya"...
    (9) push1 2     # "hiya"
    (11) incrStkImm +1 
    (13) jump1 +4   # pc 17
    (15) push1 3    # ""
    (17) done 

And for completeness, inside a procedure (well, lambda in this case, but the bytecode is identical):

tcl::unsupported::disassemble lambda {{} {if {$x==$y} {
    incr hiya
}}}
ByteCode 0x0x1008ecc10, refCt 1, epoch 96, interp 0x0x100829a10 (epoch 96)
  Source "if {$x==$y} {\n    incr hiya\n"...
  Cmds 2, src 29, inst 15, litObjs 1, aux 0, stkDepth 2, code/src 0.00
  Proc 0x0x102024610, refCt 1, args 0, compiled locals 3
      slot 0, scalar, "x"
      slot 1, scalar, "y"
      slot 2, scalar, "hiya"
  Commands 2:
      1: pc 0-13, src 0-28        2: pc 7-9, src 18-26
  Command 1: "if {$x==$y} {\n    incr hiya\n"...
    (0) loadScalar1 %v0     # var "x"
    (2) loadScalar1 %v1     # var "y"
    (4) eq 
    (5) jumpFalse1 +7   # pc 12
  Command 2: "incr hiya"...
    (7) incrScalar1Imm %v2 +1   # var "hiya"
    (10) jump1 +4   # pc 14
    (12) push1 0    # ""
    (14) done 

Upvotes: 1

Related Questions