Luis Lavaire
Luis Lavaire

Reputation: 701

How to iterate over the characters of a string in a POSIX shell script?

A POSIX compliant shell shall provide mechanisms like this to iterate over collections of strings:

for x in $(seq 1 5); do
    echo $x
done

But, how do I iterate over each character of a word?

Upvotes: 12

Views: 3546

Answers (4)

MarzVIX
MarzVIX

Reputation: 29

I was developing a script which demanded stacks... So, we can use it to iterate through strings

#!/bin/sh
# posix script

pop () {
#    $1 top
#    $2 stack
    eval $1='$(expr "'\$$2'" : "\(.\).*")'
    eval $2='$(expr "'\$$2'" : ".\(.*\)" )'
}

string="ABCDEFG"
while [ "$string" != "" ]
do
    pop c string
    echo "--" $c
done

Upvotes: -1

David Farrell
David Farrell

Reputation: 457

Use getopts to process input one character at a time. The : instructs getopts to ignore illegal options and set OPTARG. The leading - in the input makes getopts treat the string as a options.

If getopts encounters a colon, it will not set OPTARG, so the script uses parameter expansion to return : when OPTARG is not set/null.

#!/bin/sh
IFS='
'
split_string () {
  OPTIND=1;
  while getopts ":" opt "-$1"
    do echo "'${OPTARG:-:}'"
  done
}

while read -r line;do
  split_string "$line"
done

As with the accepted answer, this processes strings byte-wise instead of character-wise, corrupting multibyte codepoints. The trick is to detect multibyte codepoints, concatenate their bytes and then print them:

#!/bin/sh
IFS='
'
split_string () {
  OPTIND=1;
  while getopts ":" opt "$1";do
    case "${OPTARG:=:}" in
      ([[:print:]])
        [ -n "$multi" ] && echo "$multi" && multi=
        echo "$OPTARG" && continue
    esac
    multi="$multi$OPTARG"
    case "$multi" in
      ([[:print:]]) echo "$multi" && multi=
    esac
  done
  [ -n "$multi" ] && echo "$multi"
}
while read -r line;do
  split_string "-$line"
done

Here the extra case "$multi" is used to detect when the multi buffer contains a printable character. This works on shells like Bash and Zsh but Dash and busybox ash do not pattern match multibyte codepoints, ignoring locale.

This degrades somewhat nicely: Dash/ash treat sequences of multibyte codepoints as one character, but handle multibyte characters surrounded by single byte characters fine.

Depending on your requirements it may be preferable not to split consecutive multibyte codepoints anyway, as the next codepoint may be a combining character which modifies the character before it.

This won't handle the case where a single byte character is followed by a combining character.

Upvotes: 3

agc
agc

Reputation: 8406

This works in dash and busybox:

echo 'ab * cd' | grep -o .

Output:

a
b

*

c
d

Upvotes: 2

Gordon Davisson
Gordon Davisson

Reputation: 125788

It's a little circuitous, but I think this'll work in any posix-compliant shell. I've tried it in dash, but I don't have busybox handy to test with.

var='ab * cd'

tmp="$var"    # The loop will consume the variable, so make a temp copy first
while [ -n "$tmp" ]; do
    rest="${tmp#?}"    # All but the first character of the string
    first="${tmp%"$rest"}"    # Remove $rest, and you're left with the first character
    echo "$first"
    tmp="$rest"
done

Output:

a
b

*

c
d

Note that the double-quotes around the right-hand side of assignments are not needed; I just prefer to use double-quotes around all expansions rather than trying to keep track of where it's safe to leave them off. On the other hand, the double-quotes in [ -n "$tmp" ] are absolutely necessary, and the inner double-quotes in first="${tmp%"$rest"}" are needed if the string contains "*".

Upvotes: 13

Related Questions