Reputation: 17208

Is there a command for substituting a set of characters by a set of strings?

I'm would like to substitute a set of _{edit: single byte} characters with a set of literal strings in a stream, without any constraint on the line size.

#!/bin/bash

for (( i = 1; i <= 0x7FFFFFFFFFFFFFFF; i++ ))
do
    printf '\a,\b,\t,\v'
done |
chars_to_strings $'\a\b\t\v' '<bell>' '<backspace>' '<horizontal-tab>' '<vertical-tab>'

The expected output would be:

<bell>,<backspace>,<horizontal-tab>,<vertical-tab><bell>,<backspace>,<horizontal-tab>,<vertical-tab><bell>...

I can think of a bash function that would do that, something like:

chars_to_strings() {
    local delim buffer
    while true
    do
        delim=''
        IFS='' read -r -d '.' -n 4096 buffer && (( ${#buffer} != 4096 )) && delim='.'

        if [[ -n "${delim:+_}" ]] || [[ -n "${buffer:+_}" ]]
        then
            # Do the replacements in "$buffer"
            # ...

            printf "%s%s" "$buffer" "$delim"
        else
            break
        fi
    done
}

But I'm looking for a more efficient way, any thoughts?

Upvotes: 1

Answers (5)

RARE Kpop Manifesto

Reputation: 2865

don't waste FS/OFS - use the built-in variables to take 2 out of the 5 needed :

echo $'   \t   abc xyz    \t  \a   \n\n ' |

mawk 'gsub(/\7/,  "<bell>", $!(NF = NF)) + gsub(/\10/,"<bs>") +\
      gsub(/\11/,"<h-tab>")^_' OFS='<v-tab>'  FS='\13'  ORS='<newline>'

   <h-tab>   abc xyz    <h-tab>  <bell>   <newline><newline> <newline>

Upvotes: 1

Ed Morton

Reputation: 204055

To have NO constraint on the line length you could do something like this with GNU awk:

awk -v RS='.{1,100}' -v ORS= '{
    $0 = RT
    gsub(foo,bar)
    print
}'

That will read and process the input 100 chars at a time no matter which chars are present, whether it has newlines or not, and even if the input was one multi-terabyte line.

Replace gsub(foo,bar) with whatever substitution(s) you have in mind, e.g.:

$ printf '\a,\b,\t,\v' |
    awk -v RS='.{1,100}' -v ORS= '{
        $0 = RT
        gsub(/\a/,"<bell>")
        gsub(/\b/,"<backspace>")
        gsub(/\t/,"<horizontal-tab>")
        gsub(/\v/,"<vertical-tab>")
        print
    }'
<bell>,<backspace>,<horizontal-tab>,<vertical-tab>

and of course it'd be trivial to pass a list of old and new strings to awk rather than hardcoding them, you'd just have to sanitize any regexp or backreference metachars before calling gsub().

Upvotes: 2

markp-fuso

Reputation: 35006

Assuming the overall objective is to provide the ability to process a stream of data in real time without having to wait for a EOL/End-of-buffer occurrence to trigger processing ...

A few items:

continue to use the while/read -n loop to read a chunk of data from the incoming stream and store in buffer variable
push the conversion code into something that's better suited to string manipulation (ie, something other than bash); for sake of discussion we'll choose awk
within the while/read -n loop printf "%s\n" "${buffer}" and pipe the output from the while loop into awk; NOTE: the key item is to introduce an explicit \n into the stream so as to trigger awk processing for each new 'line' of input; OP can decide if this additional \n must be distinguished from a \n occurring in the original stream of data
awk then parses each line of input as per the replacement logic, making sure to append anything leftover to the front of the next line of input (ie, for when the while/read -n breaks an item in the 'middle')

General idea:

chars_to_strings() {
    while read -r -n 15 buffer               # using '15' for demo purposes otherwise replace with '4096' or whatever OP wants
    do
        printf "%s\n" "${buffer}"
    done | awk '{print NR,FNR,length($0)}'   # replace 'print ...' with OP's replacement logic
}

Take for a test drive:

for (( i = 1; i <= 20; i++ ))
do  
    printf '\a,\b,\t,\v'
    sleep 0.1                 # add some delay to data being streamed to chars_to_strings()
done | chars_to_strings 

1 1 15                        # output starts printing right away
2 2 15                        # instead of waiting for the 'for'
3 3 15                        # loop to complete
4 4 15
5 5 13
6 6 15
7 7 15
8 8 15
9 9 15

A variation on this idea using a named pipe:

mkfifo /tmp/pipeX

sleep infinity > /tmp/pipeX                        # keep pipe open so awk does not exit

awk '{print NR,FNR,length($0)}' < /tmp/pipeX &

chars_to_strings() {
    while read -r -n 15 buffer
    do
        printf "%s\n" "${buffer}"
    done > /tmp/pipeX
}

Take for a test drive:

for (( i = 1; i <= 20; i++ ))
do
    printf '\a,\b,\t,\v'
    sleep 0.1
done | chars_to_strings

1 1 15                        # output starts printing right away
2 2 15                        # instead of waiting for the 'for'
3 3 15                        # loop to complete
4 4 15
5 5 13
6 6 15
7 7 15
8 8 15
9 9 15

# kill background 'awk' and/or 'sleep infinity' when no longer needed

Upvotes: 2

tripleee

Reputation: 189679

For a simple one-liner with reasonable portability, try Perl.

for (( i = 1; i <= 0x7FFFFFFFFFFFFFFF; i++ ))
do
    printf '\a,\b,\t,\v'
done |
perl -pe 's/\a/<bell>/g;
  s/\b/<backspace>/g;s/\t/<horizontal-tab>/g;s/\v/<vertical-tab>/g'

Perl internally does some intelligent optimizations so it's not encumbered by lines which are longer than its input buffer or whatever.

Perl by itself is not POSIX, of course; but it can be expected to be installed on any even remotely modern platform (short of perhaps embedded systems etc).

Upvotes: 1

Ionuț G. Stan

Reputation: 179169

Since you seem to be okay with using ANSI C quoting via $'...' strings, then maybe use sed?

sed $'s/\a/<bell>/g; s/\b/<backspace>/g; s/\t/<horizontal-tab>/g; s/\v/<vertical-tab>/g'

Or, via separate commands:

sed -e $'s/\a/<bell>/g' \
    -e $'s/\b/<backspace>/g' \
    -e $'s/\t/<horizontal-tab>/g' \
    -e $'s/\v/<vertical-tab>/g'

Or, using awk, which replaces newline characters too (by customizing the Output Record Separator, i.e., the ORS variable):

$ printf '\a,\b,\t,\v\n' | awk -vORS='<newline>' '
  {
    gsub(/\a/, "<bell>")
    gsub(/\b/, "<backspace>")
    gsub(/\t/, "<horizontal-tab>")
    gsub(/\v/, "<vertical-tab>")
    print $0
  }
'
<bell>,<backspace>,<horizontal-tab>,<vertical-tab><newline>

Upvotes: 2

Is there a command for substituting a set of characters by a set of strings?

Answers (5)

Related Questions