Rob Arthan
Rob Arthan

Reputation: 174

Rationale for POSIX specification for variable assignments in a shell command

I've just tracked down a problem with a shell script of mine on systems like Ubuntu that use dash for /bin/sh. My script needs to pass some environment variables in when it executes a binary and, for reasons that are not relevant here, it runs the binary using eval. A cut-down version using "env" for the binary would be like this:

#!/bin/sh
RUNBIN=env
XYZ=abc eval $RUNBIN

With dash, the above fails to pass XYZ=abc into the environment when it runs env. If you play around some more you will also find that it declares XYZ as a (non-exported) shell variable for the rest of the script. I presume is down to an issue raised on the 2008 POSIX specs and addressed in the 2013 edition which says:

If no command name results [from processing the command line], or if the command name is a special built-in or function, variable assignments shall affect the current execution environment. Otherwise, the variable assignments shall be exported for the execution environment of the command and shall not affect the current execution environment except as a side-effect of the expansions performed [while doing tilde expansion and other stuff to get the command line]

The issue is actually about the processing of a list of variable assignments and whether earlier assignments are visible in later ones, but the change in wording seems to have had an unintended side-effect of making the variable assignments completely useless if the shell built-in you run is going to execute something. You get the same problem if you do something like

echo 'env | grep XYZ' > t
XYZ=abc . ./t

which prints nothing. bash does what I would expect (with the last example it prints XYZ=abc. Or with --posix it additionally assigns (but does not export) XYZ=abc in subsequent commands. So:

XYZ=abc . ./t
echo XYZ=$XYZ

prints XYZ=abc twice.

I find it odd that the variable assignments persist into subsequent commands when the command is a built-in, but life is full of oddities. However it just seems plain wrong that variable assignments on a command line aren't exported into any commands the command line runs. Unfortunately bash and I seem to be in the minority about this - on my Mac, ksh and zsh do what dash does. It is easy enough, but inelegant, to work around this behaviour using export and brackets to delimit the scope of the variables. My question is why would anybody want the POSIX behaviour, especially by providing examples where it is useful in practice? Or should this be reported as a bug in POSIX?

Upvotes: 2

Views: 1070

Answers (1)

Craig Estey
Craig Estey

Reputation: 33631

Note: A shell is a somewhat nebulous term for a command line interpreter. Each shell (e.g. sh, bash, ksh, dash, tcsh, ...) is free to interpret things as it chooses. Some don't even have sh-like syntax [or semantics] at all.

However, most shells do follow the standard rules, because they tend to make the most sense. The POSIX standard does make sense, when broken down a bit as I'll try to do below.


If no command name results [from processing the command line], variable assignments shall affect the current execution environment.

This covers the following:

XYZ=abc
echo $XYZ

XYZ is a simple variable. It does not set the exported environment. This is what one would expect.


if the command name is a special built-in or function, variable assignments shall affect the current execution environment.

This covers the following:

XYZ=abc builtin
XYZ=def myfunction

It is [effectively] shorthand for:

XYZ=abc ; builtin
XYZ=def ; myfunction

The reason for this is that builtins and/or functions run in the current environment and need access to or may modify variables there:

function myfunction ()
{
    XYZ=qrm$XYZ
}

However ...

bash (e.g.) does not do this by default [without --posix]. To implement its default behavior, bash must "clone" [a portion of] the environment [herein XYZ] for the builtin/function duration. Although, it may be technically more correct (i.e. behaves more like the external program case), it also adds complexity to the implementation.

POSIX chose the definition that leads to a simpler implementation. Also, the fact that the majority of non-bash shells were doing it one way and bash another may have influenced things.


Otherwise, the variable assignments shall be exported for the execution environment of the command and shall not affect the current execution environment

This covers:

XYZ=abc external_program

The reason is that this is [effectively] shorthand for:

env XYZ=abc external_program

The behavior is easy to implement by setting XYZ in the child [after the fork and before the execvp], so no additional complexity [in the form of a new mechanism] is needed.

Side note: If you want to set both environments:

export XYZ=abc
external_program
myfunction
echo $XYZ

When we do XYZ=abc <some_command>, ideally we want XYZ to persist only for the duration of the command (i.e. that's the most useful model).

For an external, this is easy enough, as explained above.

But, for an internal, the only way to do this is to modify the current environment. Otherwise, builtins and functions can't work correctly. So, the changes must persist after the command is executed [Exception: bash as noted above].


So, if you are using a shell that doesn't quite follow the above POSIX rules, you can [probably] guarantee the effect you want by explicitly using the "shorthand" versions above [or whatever special shell specific sequence you need].


Personally, I dislike allowing XYZ=abc <command_of_any_sort> at all. If you want this, do env XYZ=abc <command> or XYZ=abc ; myfunction. IMO, the "compound" version is just needless cruft that requires that one learn "yet another [minor] thing" ...

That's [just] my opinion. Obviously, there are others.

POSIX has to balance opinion, known current implementations, lack of spec clarity, etc.

So, is POSIX broken? In this instance, it has specific, well defined behavior. So, probably, the answer is no, even if there are [arguably] better ways to do it.


UPDATE:

Per your comment, take heart. There is a way to get the results for a builtin/function without resorting to eval.

Upon further thought, I dislike the default bash behavior for two reasons.

First and foremost, it's non-standard [not POSIX compliant]. So, it's the "odd duck".

Secondly, there is a clean way to get the bash behavior if one so desires [in a posix compliant shell], by using scoped local variables. Essentially, this is what bash was doing anyway (i.e. injecting the variable into the function's scoped environment).

Here are four ways to get the bash equivalent behavior [albeit with a bit more setup]:

#!/bin/sh -

myfnc () {
    XYZ=myf$XYZ
    echo "myfnc: XYZ=$XYZ"
}

# ------------------------------------------------------------------------------
bashlike1 () {
    local XYZ
    echo
    echo "bashlike1: ..."
    XYZ=def myfnc
}

XYZ="abc"
bashlike1
echo "parent: XYZ=$XYZ"

# ------------------------------------------------------------------------------
bashlike2 () {
    local XYZ="def"
    echo
    echo "bashlike2: ..."
    $1
}

XYZ="abc"
bashlike2 myfnc
echo "parent: XYZ=$XYZ"

# ------------------------------------------------------------------------------
bashlike3 () {
    local $1
    echo
    echo "bashlike3: ..."
    $2
}

XYZ="abc"
bashlike3 "XYZ=def" myfnc
echo "parent: XYZ=$XYZ"

# ------------------------------------------------------------------------------
bashlike4 () {
    $1
    echo
    echo "bashlike4: ..."
    $2
}

XYZ="abc"
bashlike4 "local XYZ=def" myfnc
echo "parent: XYZ=$XYZ"

Here is the output of /bin/sh <script> and /bin/bash --posix <script>:

bashlike1: ...
myfnc: XYZ=myfdef
parent: XYZ=myfdef

bashlike2: ...
myfnc: XYZ=myfdef
parent: XYZ=abc

bashlike3: ...
myfnc: XYZ=myfdef
parent: XYZ=abc

bashlike4: ...
myfnc: XYZ=myfdef
parent: XYZ=abc

However, /bin/bash <script> and /bin/dash <script> produce:

bashlike1: ...
myfnc: XYZ=myfdef
parent: XYZ=abc

bashlike2: ...
myfnc: XYZ=myfdef
parent: XYZ=abc

bashlike3: ...
myfnc: XYZ=myfdef
parent: XYZ=abc

bashlike4: ...
myfnc: XYZ=myfdef
parent: XYZ=abc

And /bin/zsh <script> produces:

bashlike1: ...
myfnc: XYZ=myfdef
parent: XYZ=

bashlike2: ...
myfnc: XYZ=myfdef
parent: XYZ=abc

bashlike3: ...
myfnc: XYZ=myfdef
parent: XYZ=abc

bashlike4: ...
myfnc: XYZ=myfabc
parent: XYZ=myfabc

So, bashlike2 and bashlike3 produce the consistent results across the board. But, bashlike1 [that tries to match the original question syntax most closely], has the widest variation.

And, zsh doesn't like bashlike4 at all. That's too bad because it would have allowed:

bashlike4 "local X=3 ; local Y=4 ; local Z=5" ...

UPDATE #2:

The first update was just for builtin/function, so it wouldn't work [wasn't intended to work] for external programs.

Caveat: Without having your full usage, this is a bit of guesswork.

I run fedora so I can't say whether this is true, but ubuntu may install bash in parallel to dash. In that case, using #!/bin/bash - (vs #!/bin/sh -) may be the simple solution. However, I'm going to assume that's not feasible.

The next choice would be:

RUNBIN="env XYZ=abc external_program"
eval $RUNBIN

I assume external_program is somewhat complex, involving some variables, or the eval wouldn't be necessary. If XYZ=abc were more complex as well, this might involve escaping $, etc. as I believe you've mentioned. So, maybe not an option either.

The underlying issue seems to be that when you set/export XYZ for one program, you do not want it to linger in the top/parent environment [or it will change the parent's value of XYZ which will be used later].

Because the XYZ=abc eval $RUNBIN syntax is inconsistent/unpredictable across various shells, we may have to [no matter how convenient it may be] discard it in favor of something that is a bit more "retro" but will always work:

# (1) simple export -- parent gets XYZ
export XYZ=abc
eval $RUNBIN

# (2) export then unset -- parent temporarily gets XYZ
export XYZ=abc
eval $RUNBIN
unset XYZ

# (3) preserve, export, restore -- parent's original is preserved
SAVE_XYZ=$XYZ
export XYZ=abc
eval $RUNBIN
unset XYZ
XYZ=$SAVE_XYZ
unset SAVE_XYZ

# (4) similar to (3), but we've created some helper functions to streamline
preserve XYZ=abc QRM=jkl
eval $RUNBIN
restore XYZ QRM

Another equally [or more so] retro method would be to dynamically build up a temporary wrapper script. The first N lines are export SYM=val followed by the final target invocation:

export XYZ=abc
export QRM=jkl
external_program $*

Then, invoke it via:

RUNBIN="wrapper ..."
eval $RUNBIN

If we generalize the wrapper script approach, we can dump out whatever variables into the file. Then, replace eval $RUNBIN with echo $RUNBIN >> wrapper. Now, eval is no longer required as this echo will give the same effect, more or less. This is how it was done before eval ever existed, sometimes by using an awk script to generate the wrapper script.

What we've dealing with is what I call "shell fragmentation" [ala "android fragmentation"]. On a given system, we don't really know what flavor of *sh has been installed as /bin/sh or if a given flavor is installed at all [under its secondary name].

However, on a given system other tools are not fragmented. tcsh, perl, python, and [yikes] perl6 are all consistent. If installed, we can assume that virtually all things run the same. In perl, for newer features, there is a version number the script can check.

I wrote shell scripts for many years until I realized that perl could replace any combination of shell, sed, awk, etc. and do so more cleanly. So, for my personal stuff, it's 100% perl

Probably not an option for you, but I would have felt remiss if I didn't mention it.

Upvotes: 4

Related Questions