Reputation: 7303
We have files with some chars represented by decimal(!) ascii values enclosed in cid(#)
as e.g. (cid:104)
for h
. The string hello
is thus represented as (cid:104)(cid:101)(cid:108)(cid:108)(cid:111)
.
How can I substitute this with the corresponding ascii characters using sed?
Here is an example file:
$ cat input.txt
first line
pre (cid:104)(cid:101)(cid:108)(cid:108)(cid:111) post
last line
What I've tried so far is:
$ x="(cid:104)(cid:101)(cid:108)(cid:108)(cid:111)"
$ echo $x | sed 's/(cid:\([^\)]*\))/\1/g'
104101108108111
But wee need the output to be hello
$ cat output.txt
first line
pre hello post
last line
I'm trying to use printf
in sed
. But cannot find out how to pass the backreference \1
to printf
sed 's/(cid:\([^\)]*\))/'`printf "\x$(printf %x \1)"`'/g'
Upvotes: 2
Views: 984
Reputation: 23667
$ cat input.txt
first line
pre (cid:104)(cid:101)(cid:108)(cid:108)(cid:111) post
last line
$ perl -pe 's/\(cid:(\d+)\)/chr($1)/ge' input.txt > output.txt
$ cat output.txt
first line
pre hello post
last line
Thanks @123 for suggesting to use chr($1)
instead of sprintf "%c", $1
. See chr for documentation
Reference: Integer ASCII value to character in BASH using printf
Upvotes: 3
Reputation: 289835
Using %c
you can convert an ASCII code into its corresponding character:
$ awk 'BEGIN {printf "%c", 104}'
h
So it is a matter of extracting the numbers from within (cid:XX)
. This I do by setting the FS to (
and looping through the fields:
awk -v FS='(' '{for (i=2; i<=NF; i++) {
r=gensub(/cid:([0-9]+)\)/, "\\1", "g", $i);
printf "%c", r+0
}
}' file
This uses gensub()
and accesses to the captured groups as described in GNU awk: accessing captured groups in replacement text. Hence dependent on a GNU awk.
For your given input it returns:
$ awk -v FS='(' '{for (i=2; i<=NF; i++) {r=gensub(/cid:([0-9]+)\)/, "\\1", "g", $i); printf "%c", r+0}}' file
hello
Upvotes: 0