Reputation: 2866
Two part question:
Part One: First I have a sequence AATTCCGG which I want to change to TAAGGCC. I used gsub to change A to T, C to G, G to C and T to A. Unfortunetly awk executes these orders sequentially, so I ended up with AAACCCC. I got around this by using upper and lower case, then converting back to upper case values, but I would like to do this in a single step if possible.
example:
echo AATTCCGG | awk '{gsub("A","T",$1);gsub("T","A",$1);gsub("C","G",$1);gsub("G","C",$1);print $0}'
OUTPUT: AAAACCCC
Part Two: Is there a way to get awk to run to the end of a file for one set of instructions before starting a second set? I tried some of the following, but with no success
for the data set
1 A
2 B
3 C
4 D
5 E
I am using the following pipe to get the data I want (Just an example)
awk '{if ($1%2==0)print $1,"E";else print $0}' test | awk '{if ($1%2==0 && $2=="E") print $0}'
I am using a pipe to rerun the program, however I have found that it is quicker if I don't have to rerun the program.
Upvotes: 2
Views: 495
Reputation: 2866
Here is a method I have found for the first part of the question using awk. It uses an array and a for loop.
cat sub.awk
awk '
BEGIN{d["G"]="C";d["C"]="G";d["T"]="A";d["A"]="T";FS="";OFS=""}
{for(i=1;i<(NF+1);i++)
{if($i in d)
$i=d[$i]}
}
{print}'
Input/Output:
ATCG
TAGC
Upvotes: 0
Reputation: 45293
for part two, try this command:
awk '{if ($1%2==0)print $1,"E"}' test
Upvotes: 0
Reputation: 26194
This can be efficiently solved with tr
:
$ echo AATTCCGG | tr ATCG TAGC
Regarding part two (this should be a different question, really): no, it is not possible with awk
, pipe is the way to go.
Upvotes: 3