Reputation: 174
I've just tracked down a problem with a shell script of mine on systems like Ubuntu that use dash for /bin/sh. My script needs to pass some environment variables in when it executes a binary and, for reasons that are not relevant here, it runs the binary using eval. A cut-down version using "env" for the binary would be like this:
#!/bin/sh
RUNBIN=env
XYZ=abc eval $RUNBIN
With dash, the above fails to pass XYZ=abc into the environment when it runs env. If you play around some more you will also find that it declares XYZ as a (non-exported) shell variable for the rest of the script. I presume is down to an issue raised on the 2008 POSIX specs and addressed in the 2013 edition which says:
If no command name results [from processing the command line], or if the command name is a special built-in or function, variable assignments shall affect the current execution environment. Otherwise, the variable assignments shall be exported for the execution environment of the command and shall not affect the current execution environment except as a side-effect of the expansions performed [while doing tilde expansion and other stuff to get the command line]
The issue is actually about the processing of a list of variable assignments and whether earlier assignments are visible in later ones, but the change in wording seems to have had an unintended side-effect of making the variable assignments completely useless if the shell built-in you run is going to execute something. You get the same problem if you do something like
echo 'env | grep XYZ' > t
XYZ=abc . ./t
which prints nothing. bash
does what I would expect (with the last example it prints XYZ=abc. Or with --posix
it additionally assigns (but does not export) XYZ=abc in subsequent commands. So:
XYZ=abc . ./t
echo XYZ=$XYZ
prints XYZ=abc
twice.
I find it odd that the variable assignments persist into subsequent commands when the command is a built-in, but life is full of oddities. However it just seems plain wrong that variable assignments on a command line aren't exported into any commands the command line runs. Unfortunately bash
and I seem to be in the minority about this - on my Mac, ksh and zsh do what dash does. It is easy enough, but inelegant, to work around this behaviour using export
and brackets to delimit the scope of the variables. My question is why would anybody want the POSIX behaviour, especially by providing examples where it is useful in practice? Or should this be reported as a bug in POSIX?
Upvotes: 2
Views: 1070
Reputation: 33631
Note: A shell is a somewhat nebulous term for a command line interpreter. Each shell (e.g. sh
, bash
, ksh
, dash
, tcsh
, ...
) is free to interpret things as it chooses. Some don't even have sh
-like syntax [or semantics] at all.
However, most shells do follow the standard rules, because they tend to make the most sense. The POSIX standard does make sense, when broken down a bit as I'll try to do below.
If no command name results [from processing the command line], variable assignments shall affect the current execution environment.
This covers the following:
XYZ=abc
echo $XYZ
XYZ
is a simple variable. It does not set the exported environment. This is what one would expect.
if the command name is a special built-in or function, variable assignments shall affect the current execution environment.
This covers the following:
XYZ=abc builtin
XYZ=def myfunction
It is [effectively] shorthand for:
XYZ=abc ; builtin
XYZ=def ; myfunction
The reason for this is that builtins and/or functions run in the current environment and need access to or may modify variables there:
function myfunction ()
{
XYZ=qrm$XYZ
}
However ...
bash
(e.g.) does not do this by default [without --posix
]. To implement its default behavior, bash
must "clone" [a portion of] the environment [herein XYZ
] for the builtin/function duration. Although, it may be technically more correct (i.e. behaves more like the external program case), it also adds complexity to the implementation.
POSIX chose the definition that leads to a simpler implementation. Also, the fact that the majority of non-bash shells were doing it one way and bash another may have influenced things.
Otherwise, the variable assignments shall be exported for the execution environment of the command and shall not affect the current execution environment
This covers:
XYZ=abc external_program
The reason is that this is [effectively] shorthand for:
env XYZ=abc external_program
The behavior is easy to implement by setting XYZ
in the child [after the fork
and before the execvp
], so no additional complexity [in the form of a new mechanism] is needed.
Side note: If you want to set both environments:
export XYZ=abc
external_program
myfunction
echo $XYZ
When we do XYZ=abc <some_command>
, ideally we want XYZ
to persist only for the duration of the command (i.e. that's the most useful model).
For an external, this is easy enough, as explained above.
But, for an internal, the only way to do this is to modify the current environment. Otherwise, builtins and functions can't work correctly. So, the changes must persist after the command is executed [Exception: bash
as noted above].
So, if you are using a shell that doesn't quite follow the above POSIX rules, you can [probably] guarantee the effect you want by explicitly using the "shorthand" versions above [or whatever special shell specific sequence you need].
Personally, I dislike allowing XYZ=abc <command_of_any_sort>
at all. If you want this, do env XYZ=abc <command>
or XYZ=abc ; myfunction
. IMO, the "compound" version is just needless cruft that requires that one learn "yet another [minor] thing" ...
That's [just] my opinion. Obviously, there are others.
POSIX has to balance opinion, known current implementations, lack of spec clarity, etc.
So, is POSIX broken? In this instance, it has specific, well defined behavior. So, probably, the answer is no, even if there are [arguably] better ways to do it.
UPDATE:
Per your comment, take heart. There is a way to get the results for a builtin/function without resorting to eval
.
Upon further thought, I dislike the default bash
behavior for two reasons.
First and foremost, it's non-standard [not POSIX compliant]. So, it's the "odd duck".
Secondly, there is a clean way to get the bash
behavior if one so desires [in a posix compliant shell], by using scoped local
variables. Essentially, this is what bash
was doing anyway (i.e. injecting the variable into the function's scoped environment).
Here are four ways to get the bash
equivalent behavior [albeit with a bit more setup]:
#!/bin/sh -
myfnc () {
XYZ=myf$XYZ
echo "myfnc: XYZ=$XYZ"
}
# ------------------------------------------------------------------------------
bashlike1 () {
local XYZ
echo
echo "bashlike1: ..."
XYZ=def myfnc
}
XYZ="abc"
bashlike1
echo "parent: XYZ=$XYZ"
# ------------------------------------------------------------------------------
bashlike2 () {
local XYZ="def"
echo
echo "bashlike2: ..."
$1
}
XYZ="abc"
bashlike2 myfnc
echo "parent: XYZ=$XYZ"
# ------------------------------------------------------------------------------
bashlike3 () {
local $1
echo
echo "bashlike3: ..."
$2
}
XYZ="abc"
bashlike3 "XYZ=def" myfnc
echo "parent: XYZ=$XYZ"
# ------------------------------------------------------------------------------
bashlike4 () {
$1
echo
echo "bashlike4: ..."
$2
}
XYZ="abc"
bashlike4 "local XYZ=def" myfnc
echo "parent: XYZ=$XYZ"
Here is the output of /bin/sh <script>
and /bin/bash --posix <script>
:
bashlike1: ...
myfnc: XYZ=myfdef
parent: XYZ=myfdef
bashlike2: ...
myfnc: XYZ=myfdef
parent: XYZ=abc
bashlike3: ...
myfnc: XYZ=myfdef
parent: XYZ=abc
bashlike4: ...
myfnc: XYZ=myfdef
parent: XYZ=abc
However, /bin/bash <script>
and /bin/dash <script>
produce:
bashlike1: ...
myfnc: XYZ=myfdef
parent: XYZ=abc
bashlike2: ...
myfnc: XYZ=myfdef
parent: XYZ=abc
bashlike3: ...
myfnc: XYZ=myfdef
parent: XYZ=abc
bashlike4: ...
myfnc: XYZ=myfdef
parent: XYZ=abc
And /bin/zsh <script>
produces:
bashlike1: ...
myfnc: XYZ=myfdef
parent: XYZ=
bashlike2: ...
myfnc: XYZ=myfdef
parent: XYZ=abc
bashlike3: ...
myfnc: XYZ=myfdef
parent: XYZ=abc
bashlike4: ...
myfnc: XYZ=myfabc
parent: XYZ=myfabc
So, bashlike2
and bashlike3
produce the consistent results across the board. But, bashlike1
[that tries to match the original question syntax most closely], has the widest variation.
And, zsh
doesn't like bashlike4
at all. That's too bad because it would have allowed:
bashlike4 "local X=3 ; local Y=4 ; local Z=5" ...
UPDATE #2:
The first update was just for builtin/function, so it wouldn't work [wasn't intended to work] for external programs.
Caveat: Without having your full usage, this is a bit of guesswork.
I run fedora so I can't say whether this is true, but ubuntu may install bash
in parallel to dash
. In that case, using #!/bin/bash -
(vs #!/bin/sh -
) may be the simple solution. However, I'm going to assume that's not feasible.
The next choice would be:
RUNBIN="env XYZ=abc external_program"
eval $RUNBIN
I assume external_program
is somewhat complex, involving some variables, or the eval
wouldn't be necessary. If XYZ=abc
were more complex as well, this might involve escaping $
, etc. as I believe you've mentioned. So, maybe not an option either.
The underlying issue seems to be that when you set/export XYZ
for one program, you do not want it to linger in the top/parent environment [or it will change the parent's value of XYZ
which will be used later].
Because the XYZ=abc eval $RUNBIN
syntax is inconsistent/unpredictable across various shells, we may have to [no matter how convenient it may be] discard it in favor of something that is a bit more "retro" but will always work:
# (1) simple export -- parent gets XYZ
export XYZ=abc
eval $RUNBIN
# (2) export then unset -- parent temporarily gets XYZ
export XYZ=abc
eval $RUNBIN
unset XYZ
# (3) preserve, export, restore -- parent's original is preserved
SAVE_XYZ=$XYZ
export XYZ=abc
eval $RUNBIN
unset XYZ
XYZ=$SAVE_XYZ
unset SAVE_XYZ
# (4) similar to (3), but we've created some helper functions to streamline
preserve XYZ=abc QRM=jkl
eval $RUNBIN
restore XYZ QRM
Another equally [or more so] retro method would be to dynamically build up a temporary wrapper script. The first N lines are export SYM=val
followed by the final target invocation:
export XYZ=abc
export QRM=jkl
external_program $*
Then, invoke it via:
RUNBIN="wrapper ..."
eval $RUNBIN
If we generalize the wrapper script approach, we can dump out whatever variables into the file. Then, replace eval $RUNBIN
with echo $RUNBIN >> wrapper
. Now, eval
is no longer required as this echo
will give the same effect, more or less. This is how it was done before eval
ever existed, sometimes by using an awk
script to generate the wrapper script.
What we've dealing with is what I call "shell fragmentation" [ala "android fragmentation"]. On a given system, we don't really know what flavor of *sh
has been installed as /bin/sh
or if a given flavor is installed at all [under its secondary name].
However, on a given system other tools are not fragmented. tcsh
, perl
, python
, and [yikes] perl6
are all consistent. If installed, we can assume that virtually all things run the same. In perl
, for newer features, there is a version number the script can check.
I wrote shell scripts for many years until I realized that perl
could replace any combination of shell, sed, awk, etc. and do so more cleanly. So, for my personal stuff, it's 100% perl
Probably not an option for you, but I would have felt remiss if I didn't mention it.
Upvotes: 4