user2606364
user2606364

Reputation: 13

Converting string of ASCII characters to string of corresponding decimals

May I introduce you to the problem that destroyed my weekend. I have biological data in 4 columns

@ID:::12345/1 ACGACTACGA text !"#$%vwxyz  
@ID:::12345/2 TATGACGACTA text :;<=>?VWXYZ

I would like to use awk to edit the first column to replace characters : and / with -
I would like to convert the string in the last column with a comma-separated string of decimals that correspond to each individual ASCII character (any character ranging from ASCII 33 - 126).

@ID---12345-1 ACGACTACGA text 33,34,35,36,37,118,119,120,121,122  
@ID---12345-2 TATGACGACTA text 58,59,60,61,62,63,86,87,88,89,90

The first part is easy, but i'm stuck with the second. I've tried using awk ordinal functions and sprintf; I can only get the former to work on the first char in the string and I can only get the latter to convert hexidecimal to decimal and not with spaces. Also tried bash function

$ od -t d1 test3 | awk 'BEGIN{OFS=","}{i = $1; $1 = ""; print $0}' 

But don't know how to call this function within awk. I would prefer to use awk as I have some downstream manipulations that can also be done in awk.

Many thanks in advance

Upvotes: 0

Views: 1181

Answers (2)

dogbane
dogbane

Reputation: 274562

Using the ordinal functions from the awk manual, you can do it like this:

awk -f ord.awk  --source '{
    # replace : with - in the first field
    gsub(/:/,"-",$1)

    # calculate the ordinal by looping over the characters in the fourth field
    res=ord($4)
    for(i=2;i<=length($4);i++) {
        res=res","ord(substr($4,i))
    }
    $4=res
}1' file

Output:

@ID---12345/1 ACGACTACGA text 33,34,35,36,37,118,119,120,121,122
@ID---12345/2 TATGACGACTA text 58,59,60,61,62,63,86,87,88,89,90

Here is ord.awk (taken as is from: http://www.gnu.org/software/gawk/manual/html_node/Ordinal-Functions.html)

# ord.awk --- do ord and chr

# Global identifiers:
#    _ord_:        numerical values indexed by characters
#    _ord_init:    function to initialize _ord_



BEGIN    { _ord_init() }

function _ord_init(    low, high, i, t)
{
    low = sprintf("%c", 7) # BEL is ascii 7
    if (low == "\a") {    # regular ascii
        low = 0
        high = 127
    } else if (sprintf("%c", 128 + 7) == "\a") {
        # ascii, mark parity
        low = 128
        high = 255
    } else {        # ebcdic(!)
        low = 0
        high = 255
    }

    for (i = low; i <= high; i++) {
        t = sprintf("%c", i)
        _ord_[t] = i
    }
}

function ord(str, c)
{
    # only first character is of interest
    c = substr(str, 1, 1)
    return _ord_[c]
}

function chr(c)
{
    # force c to be numeric by adding 0
    return sprintf("%c", c + 0)
}

If you don't want to include the whole of ord.awk, you can do it like this:

awk 'BEGIN{ _ord_init()}
function _ord_init(low, high, i, t)
{
    low = sprintf("%c", 7) # BEL is ascii 7
    if (low == "\a") {    # regular ascii
        low = 0
        high = 127
    } else if (sprintf("%c", 128 + 7) == "\a") {
        # ascii, mark parity
        low = 128
        high = 255
    } else {        # ebcdic(!)
        low = 0
        high = 255
    }

    for (i = low; i <= high; i++) {
        t = sprintf("%c", i)
        _ord_[t] = i
    }
}
{
    # replace : with - in the first field
    gsub(/:/,"-",$1)

    # calculate the ordinal by looping over the characters in the fourth field
    res=_ord_[substr($4,1,1)]
    for(i=2;i<=length($4);i++) {
        res=res","_ord_[substr($4,i,1)]
    }
    $4=res
}1' file

Upvotes: 1

choroba
choroba

Reputation: 241828

Perl soltuion:

perl -lnae '$F[0] =~ s%[:/]%-%g; $F[-1] =~ s/(.)/ord($1) . ","/ge; chop $F[-1]; print "@F";' < input

The first substitution replaces : and / in the first field with a dash, the second one replaces each character in the last field with its ord and a comma, chop removes the last comma.

Upvotes: 0

Related Questions