Reputation: 9866
I have this POSIX compliant shell script. It takes a delimited string w.r.t. |
and prepends a -
to substrings if they are a single character in length:
#!/bin/sh
printf '%s\n' "k|k|jill|hill|k" | sed 's/\([|]\|^\)\([[:alnum:]]\)\([|]\|$\)/\1-\2\3/g'
This outputs:
-k|k|jill|hill|-k
Notice it doesn't account for the k sandwiched between two delimiters (I.e., |k|
).
Even more strangely, if I change the special characters in the original snippet to anything else, it does prepend a -
(note the changes: ^
to something
; $
to different
), but obviously not to the first and last k's:
#!/bin/sh
printf '%s\n' "k|k|jill|hill|k" | sed 's/\([|]\|something\)\([[:alnum:]]\)\([|]\|different\)/\1-\2\3/g'
Outputs:
k|-k|jill|hill|k
At first I thought that it was because the $
and ^
positional characters weren't optional. However they obviously are optional for $
in the first flag and ^
in the last flag of the first example.
I'm very curious to know, why is this not working and can I do what I want to with a similar sed expression?
Upvotes: 3
Views: 83
Reputation: 246837
Note that if you change the sed script from a global search and replace to a loop, you can get your desired output:
printf '%s\n' "k|k|jill|hill|k" | sed 's/\([|]\|^\)\([[:alnum:]]\)\([|]\|$\)/\1-\2\3/g'
-k|k|jill|hill|-k
versus
printf '%s\n' "k|k|jill|hill|k" | sed '
:a
s/\([|]\|^\)\([[:alnum:]]\)\([|]\|$\)/\1-\2\3/
ta
'
-k|-k|jill|hill|-k
ref: https://www.gnu.org/software/sed/manual/html_node/Programming-Commands.html
Upvotes: 4
Reputation: 48711
Engine can't match k
in middle since it had a previous successful match that its characters (k|
) are consumed right before it and it can't fall into matching another |
. Saying that if your input string was:
kk|k|jill|hill|k
you would have seen a desired output. For a workaround I'd suggest you to set -r
option in order to enable ERE syntax to use a word boundary token:
printf '%s\n' "k|k|jill|hill|k" | sed -r 's/\b([[:alnum:]])(\||$)/-\1\2/g'
or more generally:
printf '%s\n' "k|k|jill|hill|k" | sed -r 's/\b[[:alnum:]]\b/-\0/g'
Upvotes: 3