Reputation: 2866

Multiple passes with awk and execution order

Two part question:

Part One: First I have a sequence AATTCCGG which I want to change to TAAGGCC. I used gsub to change A to T, C to G, G to C and T to A. Unfortunetly awk executes these orders sequentially, so I ended up with AAACCCC. I got around this by using upper and lower case, then converting back to upper case values, but I would like to do this in a single step if possible.

example:

echo AATTCCGG | awk '{gsub("A","T",$1);gsub("T","A",$1);gsub("C","G",$1);gsub("G","C",$1);print $0}'

OUTPUT: AAAACCCC

Part Two: Is there a way to get awk to run to the end of a file for one set of instructions before starting a second set? I tried some of the following, but with no success

for the data set

1 A
2 B
3 C
4 D
5 E

I am using the following pipe to get the data I want (Just an example)

awk '{if ($1%2==0)print $1,"E";else print $0}' test | awk '{if ($1%2==0 && $2=="E") print $0}'

I am using a pipe to rerun the program, however I have found that it is quicker if I don't have to rerun the program.

Upvotes: 2

Answers (3)

jeffpkamp

Reputation: 2866

Here is a method I have found for the first part of the question using awk. It uses an array and a for loop.

cat sub.awk
awk '
        BEGIN{d["G"]="C";d["C"]="G";d["T"]="A";d["A"]="T";FS="";OFS=""}
        {for(i=1;i<(NF+1);i++)
                {if($i in d)
                        $i=d[$i]}
                                }
        {print}'

Input/Output:
ATCG
TAGC

Upvotes: 0