Reputation: 701
A POSIX compliant shell shall provide mechanisms like this to iterate over collections of strings:
for x in $(seq 1 5); do
echo $x
done
But, how do I iterate over each character of a word?
Upvotes: 12
Views: 3546
Reputation: 29
I was developing a script which demanded stacks... So, we can use it to iterate through strings
#!/bin/sh
# posix script
pop () {
# $1 top
# $2 stack
eval $1='$(expr "'\$$2'" : "\(.\).*")'
eval $2='$(expr "'\$$2'" : ".\(.*\)" )'
}
string="ABCDEFG"
while [ "$string" != "" ]
do
pop c string
echo "--" $c
done
Upvotes: -1
Reputation: 457
Use getopts to process input one character at a time. The :
instructs getopts to ignore illegal options and set OPTARG. The leading -
in the input makes getopts treat the string as a options.
If getopts encounters a colon, it will not set OPTARG
, so the script uses parameter expansion to return :
when OPTARG
is not set/null.
#!/bin/sh
IFS='
'
split_string () {
OPTIND=1;
while getopts ":" opt "-$1"
do echo "'${OPTARG:-:}'"
done
}
while read -r line;do
split_string "$line"
done
As with the accepted answer, this processes strings byte-wise instead of character-wise, corrupting multibyte codepoints. The trick is to detect multibyte codepoints, concatenate their bytes and then print them:
#!/bin/sh
IFS='
'
split_string () {
OPTIND=1;
while getopts ":" opt "$1";do
case "${OPTARG:=:}" in
([[:print:]])
[ -n "$multi" ] && echo "$multi" && multi=
echo "$OPTARG" && continue
esac
multi="$multi$OPTARG"
case "$multi" in
([[:print:]]) echo "$multi" && multi=
esac
done
[ -n "$multi" ] && echo "$multi"
}
while read -r line;do
split_string "-$line"
done
Here the extra case "$multi"
is used to detect when the multi buffer contains a printable character. This works on shells like Bash and Zsh but Dash and busybox ash do not pattern match multibyte codepoints, ignoring locale.
This degrades somewhat nicely: Dash/ash treat sequences of multibyte codepoints as one character, but handle multibyte characters surrounded by single byte characters fine.
Depending on your requirements it may be preferable not to split consecutive multibyte codepoints anyway, as the next codepoint may be a combining character which modifies the character before it.
This won't handle the case where a single byte character is followed by a combining character.
Upvotes: 3
Reputation: 8406
This works in dash
and busybox
:
echo 'ab * cd' | grep -o .
Output:
a
b
*
c
d
Upvotes: 2
Reputation: 125788
It's a little circuitous, but I think this'll work in any posix-compliant shell. I've tried it in dash
, but I don't have busybox handy to test with.
var='ab * cd'
tmp="$var" # The loop will consume the variable, so make a temp copy first
while [ -n "$tmp" ]; do
rest="${tmp#?}" # All but the first character of the string
first="${tmp%"$rest"}" # Remove $rest, and you're left with the first character
echo "$first"
tmp="$rest"
done
Output:
a
b
*
c
d
Note that the double-quotes around the right-hand side of assignments are not needed; I just prefer to use double-quotes around all expansions rather than trying to keep track of where it's safe to leave them off. On the other hand, the double-quotes in [ -n "$tmp" ]
are absolutely necessary, and the inner double-quotes in first="${tmp%"$rest"}"
are needed if the string contains "*".
Upvotes: 13