wolfrevo
wolfrevo

Reputation: 7303

Substitute values with ascii chars using sed

We have files with some chars represented by decimal(!) ascii values enclosed in cid(#) as e.g. (cid:104) for h. The string hello is thus represented as (cid:104)(cid:101)(cid:108)(cid:108)(cid:111).

How can I substitute this with the corresponding ascii characters using sed?

Here is an example file:

$ cat input.txt
first line
pre (cid:104)(cid:101)(cid:108)(cid:108)(cid:111) post
last line

What I've tried so far is:

$ x="(cid:104)(cid:101)(cid:108)(cid:108)(cid:111)"
$ echo $x | sed 's/(cid:\([^\)]*\))/\1/g'
104101108108111

But wee need the output to be hello

$ cat output.txt
first line
pre hello post
last line

I'm trying to use printf in sed. But cannot find out how to pass the backreference \1 to printf

sed 's/(cid:\([^\)]*\))/'`printf "\x$(printf %x \1)"`'/g'

Upvotes: 2

Views: 984

Answers (2)

Sundeep
Sundeep

Reputation: 23667

$ cat input.txt 
first line
pre (cid:104)(cid:101)(cid:108)(cid:108)(cid:111) post
last line

$ perl -pe 's/\(cid:(\d+)\)/chr($1)/ge' input.txt > output.txt
$ cat output.txt
first line
pre hello post
last line

Thanks @123 for suggesting to use chr($1) instead of sprintf "%c", $1. See chr for documentation

Reference: Integer ASCII value to character in BASH using printf

Upvotes: 3

fedorqui
fedorqui

Reputation: 289835

Using %c you can convert an ASCII code into its corresponding character:

$ awk 'BEGIN {printf "%c", 104}'
h

So it is a matter of extracting the numbers from within (cid:XX). This I do by setting the FS to ( and looping through the fields:

awk -v FS='(' '{for (i=2; i<=NF; i++) {
                  r=gensub(/cid:([0-9]+)\)/, "\\1", "g", $i);
                  printf "%c", r+0
                  }
               }' file

This uses gensub() and accesses to the captured groups as described in GNU awk: accessing captured groups in replacement text. Hence dependent on a GNU awk.

For your given input it returns:

$ awk -v FS='(' '{for (i=2; i<=NF; i++) {r=gensub(/cid:([0-9]+)\)/, "\\1", "g", $i); printf "%c", r+0}}' file
hello

Upvotes: 0

Related Questions