Reputation: 1402
I am currently working on a language that aims to compile to POSIX shell languages and I want to introduce a pop
feature. Just like how you can use "shift" to remove the first argument passed to a function:
f() {
shift
printf '%s' "$*"
}
f 1 2 3 #=> 2 3
I want some code that when introduced below can remove the last argument.
g() {
# pop
printf '%s' "$*"
}
g 1 2 3 #=> 1 2
I am aware of the array method as detailed in (Remove last argument from argument list of shell script (bash)), but I want something portable that will work in at least the following shells: ash, dash, ksh (Unix), bash, and zsh. I also want something reasonably speedy; something that opens external processes/subshells would be too heavy for small argument counts, thought if you have a creative solution I wouldn't mind seeing it regardless (and they can still be used as a fallback for large argument counts). Something as fast as those array methods would be ideal.
Upvotes: 3
Views: 1300
Reputation: 11
& pure dash compatible... :)
Usage:
test () { echo "$@" ; } ;
with_init test 1 2 3 4 ; # test will be called with: 1 2 3
Lib:
#!/bin/sh
init_arguments_ () { #L variable [#arguments] last ;
local ia_i=0 ia_v="$1" ia_tr= ;
# index variable to_return
shift ;
while [ $(( ia_i += 1 )) -lt $# ] ; do
ia_tr=$ia_tr" \"\$$ia_i\"" ; done ;
eval "$ia_v=\$ia_tr" ;
} ;
unshift () { #L [#arguments ;
local args= ; init_arguments_ args "$@" ;
echo "eval set $args" ;
} ;
unshift_ () { #L arguments_variable
eval "$1=\${$1"'% \"*}' ;
} ;
with_init () {
#L command [#arguments] last
local command=$1 args= ; shift ;
# Maybe get last before removal
# eval 'local last="$'$#\" ;
# Shorter
# $(unshift "$@") ;
# "$command" "$@" ;
# Faster
init_arguments_ args "$@" ;
## Maybe unshift another, nice for loops
# unshift_ args ;
eval "$command$args" ;
# Or: eval "set $args" ;
# "$command" "$@" ;
} ;
Upvotes: 1
Reputation: 69
alias pop='set -- $(eval printf '\''%s\\n'\'' $(seq $(expr $# - 1) | sed '\''s/^/\$/;H;$!d;x;s/\n/ /g'\'') )'
EDIT:
this is a POSIX shell solution that use aliases instead of functions; if called in a function, this gives the desired effect (it resets the function arguments by using the same number of arguments minus the last; being an alias, and with eval, it can change the values of the enclosing function):
func () {
pop
pop
echo "$@"
}
func a b c d e # prints a b c
Upvotes: 2
Reputation: 425
pop () {
i=0
while [ $((i+=1)) -lt $# ]; do
set -- "$@" "$1"
shift
done # 1 2 3 -> 3 1 2
printf '%s' "$1" # last argument
shift # $@ is now without last argument
}
Upvotes: 2
Reputation: 1402
This is my current answer:
pop() {
local n=$(($1 - ${2:-1}))
if [ -n "$ZSH_VERSION" -o -n "$BASH_VERSION" ]; then
POP_EXPR='set -- "${@:1:'$n'}"'
elif [ $n -ge 500 ]; then
POP_EXPR="set -- $(seq -s " " 1 $n | sed 's/[0-9]\+/"${\0}"/g')"
else
local index=0
local arguments=""
while [ $index -lt $n ]; do
index=$((index+1))
arguments="$arguments \"\${$index}\""
done
POP_EXPR="set -- $arguments"
fi
}
Note that local
is not POSIX, but since all major sh
shells support it (and specifically the ones I asked for in my question) and not having it can cause serious bugs, I decided to include it in this leading function. But here's a fully compliant POSIX version with obfuscated arguments to reduce the chance of bugs:
pop() {
__pop_n=$(($1 - ${2:-1}))
if [ -n "$ZSH_VERSION" -o -n "$BASH_VERSION" ]; then
POP_EXPR='set -- "${@:1:'$__pop_n'}"'
elif [ $__pop_n -ge 500 ]; then
POP_EXPR="set -- $(seq -s " " 1 $__pop_n | sed 's/[0-9]\+/"${\0}"/g')"
else
__pop_index=0
__pop_arguments=""
while [ $__pop_index -lt $__pop_n ]; do
__pop_index=$((__pop_index+1))
__pop_arguments="$__pop_arguments \"\${$__pop_index}\""
done
POP_EXPR="set -- $__pop_arguments"
fi
}
pop1() {
pop $#
eval "$POP_EXPR"
echo "$@"
}
pop2() {
pop $# 2
eval "$POP_EXPR"
echo "$@"
}
pop1 a b c #=> a b
pop1 $(seq 1 1000) #=> 1 .. 999
pop2 $(seq 1 1000) #=> 1 .. 998
Once you've created the POP_EXPR
variable with pop, you can use the following
function to change it to omit further arguments:
pop_next() {
if [ -n "$BASH_VERSION" -o -n "$ZSH_VERSION" ]; then
local np="${POP_EXPR##*:}"
np="${np%\}*}"
POP_EXPR="${POP_EXPR%:*}:$((np == 0 ? 0 : np - 1))}\""
return
fi
POP_EXPR="${POP_EXPR% \"*}"
}
pop_next
is a much simpler operation than pop
in posix shells (though it's
slightly more complex than pop
on zsh and bash)
It's used like this:
main() {
pop $#
pop_next
eval "$POP_EXPR"
}
main 1 2 3 #=> 1
Note that if you're not going to be using eval "$POP_EXPR"
immediately after
pop
and pop_next
, if you're not careful with scoping some function call
inbetween the operations could change the POP_EXPR
variable and mess things
up. To avoid this, simply put local POP_EXPR
at the start of every function
that uses pop
, if it's available.
f() {
local POP_EXPR
pop $#
g 1 2
eval "$POP_EXPR"
printf '%s' "f=$*"
}
g() {
local POP_EXPR
pop $#
eval "$POP_EXPR"
printf '%s, ' "g=$*"
}
f a b c #=> g=1, f=a b
This particular function is good enough for my purposes, but I did create a script to generate further optimized functions.
https://gist.github.com/fcard/e26c5a1f7c8b0674c17c7554fb0cd35c#file-popgen-sh
One of the ways to improve performance without using external tools here is
to realize that having several small string concatenations is slow, so doing
them in batches makes the function considerably faster. calling the script
popgen.sh -gN1,N2,N3
creates a pop function that handles the operations
in batches of N1, N2, or N3 depending on the argument count. The script also
contains other tricks, exemplified and explained below:
$ sh popgen \
> -g 10,100 \ # concatenate strings in batches\
> -w \ # overwrite current file\
> -x9 \ # hardcode the result of the first 9 argument counts\
> -t1000 \ # starting at argument count 1000, use external tools\
> -p posix \ # prefix to add to the function name (with a underscore)\
> -s '' \ # suffix to add to the function name (with a underscore)\
> -c \ # use the command popsh instead of seq/sed as the external tool\
> -@ \ # on zsh and bash, use the subarray method (checks on runtime)\
> -+ \ # use bash/zsh extensions (removes runtime check from -@)\
> -nl \ # don't use 'local'\
> -f \ # use 'function' syntax\
> -o pop.sh # output file
An equivalent to the above function can be generated with popgen.sh -t500 -g1 -@
.
In the gist containing popgen.sh
you will find a popsh.c
file that can be
compiled and used as a specialized, faster alternative to the default shell
external tools, it will be used by any function generated with
popgen.sh -c ...
if it's accessible as popsh
by the shell.
Alternatively, you can create any function or tool named popsh
and use
it in its place.
The script I used for benchmarking can be found on this gist: https://gist.github.com/fcard/f4aec7e567da2a8e97962d5d3f025ad4#file-popbench-sh
The benchmark functions are found in these lines: https://gist.github.com/fcard/f4aec7e567da2a8e97962d5d3f025ad4#file-popbench-sh-L233-L301
The script can be used as such:
$ sh popbench.sh \
> -s dash \ # shell used by the benchmark, can be dash/bash/ash/zsh/ksh.\
> -f posix \ # function to be tested\
> -i 10000 \ # number of times that the function will be called per test\
> -a '\0' \ # replacement pattern to model arguments by index (uses sed)\
> -o /dev/stdout \ # where to print the results to (concatenates, defaults to stdout)\
> -n 5,10,1000 # argument sizes to test
It will output a time -p
style sheet with a real
, user
and sys
time values,
as well as an int
value, for internal, that is calculated inside the benchmark
process using date
.
The following are the int
results of calls to
$ sh popbench.sh -s $shell -f $function -i 10000 -n 1,5,10,100,1000,10000
posix
refers to the second and third clauses, subarray
refers to the first,
while final
refers to the whole.
value count 1 5 10 100 1000 10000
---------------------------------------------------------------------------------------
dash/final 0m0.109s 0m0.183s 0m0.275s 0m2.270s 0m16.122s 1m10.239s
ash/final 0m0.104s 0m0.175s 0m0.273s 0m2.337s 0m15.428s 1m11.673s
ksh/final 0m0.409s 0m0.557s 0m0.737s 0m3.558s 0m19.200s 1m40.264s
bash/final 0m0.343s 0m0.414s 0m0.470s 0m1.719s 0m17.508s 3m12.496s
---------------------------------------------------------------------------------------
bash/subarray 0m0.135s 0m0.179s 0m0.224s 0m1.357s 0m18.911s 3m18.007s
dash/posix 0m0.171s 0m0.290s 0m0.447s 0m3.610s 0m17.376s 1m8.852s
ash/posix 0m0.109s 0m0.192s 0m0.285s 0m2.457s 0m14.942s 1m10.062s
ksh/posix 0m0.416s 0m0.581s 0m0.768s 0m4.677s 0m18.790s 1m40.407s
bash/posix 0m0.409s 0m0.739s 0m1.145s 0m10.048s 0m58.449s 40m33.024s
For large argument counts setting set -- ...
with eval is very slow on zsh no
matter no matter the method, save for eval 'set -- "${@:1:$# - 1}"'
. Even as
simple a modification as changing it to eval "set -- ${@:1:$# - 1}"
(ignoring that it doesn't work for arguments with spaces) makes it two orders
of magnitude slower.
value count 1 5 10 100 1000 10000
---------------------------------------------------------------------------------------
zsh/subarray 0m0.203s 0m0.227s 0m0.233s 0m0.461s 0m3.643s 0m38.396s
zsh/final 0m0.399s 0m0.416s 0m0.441s 0m0.722s 0m4.205s 0m37.217s
zsh/posix 0m0.718s 0m0.913s 0m1.182s 0m6.200s 0m46.516s 42m27.224s
zsh/eval-zsh 0m0.419s 0m0.353s 0m0.375s 0m0.853s 0m5.771s 32m59.576s
For more benchmarks, including only using external tools, the c popsh tool or the naive algorithm, see this file:
https://gist.github.com/fcard/f4aec7e567da2a8e97962d5d3f025ad4#file-benchmarks-md
It's generated like this:
$ git clone https://gist.github.com/f4aec7e567da2a8e97962d5d3f025ad4.git popbench
$ cd popbench
$ sh popgen_run.sh
$ sh popbench_run.sh --fast # or without --fast if you have a day to spare
$ sh poptable.sh -g >benchmarks.md
This has been the result of a week-long research on the subject, and I thought I'd share it. Hopefully it's not too long, I tried to trim it to the main information with links to the gist. This was initially made as an answer to (Remove last argument from argument list of shell script (bash)) but I felt the focus on POSIX made it off topic.
All the code in the gists linked here is licensed under the MIT license.
Upvotes: 4