George Pacheco
George Pacheco

Reputation: 39

AWK Loop Over Multiple Columns

Please pretend I have the following situation (multiple columns & rows):

1/1:123:121 TAB 0/0:1:21 TAB 1/1:12:14
0/1:12:23 TAB 0/1:12:15 TAB 0/0:123:16
0/0:3:178 TAB 1/1:123:121 TAB 1/1:2:28

What would like to have is awk looping over each column and writing a new output under these conditions:

IF the firs field (which are separated by ":") is 1/1 OR 0/0,

then write "NA" TAB "NA"

ELSE

write the two numbers the the following fields, "Number 1" TAB "Number 2". Separator between columns should be TAB.

Thus, the desired outout the the example used above would be:

NA TAB NA TAB NA TAB NA TAB NA TAB NA
12 TAB 23 TAB 12 TAB 15 TAB NA TAB NA
NA TAB NA TAB NA TAB NA TAB NA TAB NA

Below is my current code, which work for the first column, but I do not know how to make it work for ALL columns in the file.

awk '{split($0,a,":"); print a[1]"\t"a[2]"\t"a[3]}' |
awk -F"\t" '{
    if ($1 == "0/0" || $1 == "1/1")
        print $1="NA", $2="NA"
    else
        print $2"\t"$3
}'

Any ideas of how this could be achieved?

Many thanks in advance, George.

Upvotes: 1

Views: 1052

Answers (4)

tomc
tomc

Reputation: 1207

A sed solution:

sed  's~\(0/0\|1/1\)[0-9:]\+~NA\tNA~g; s~./.:\([0-9]\+\)\:\([0-9]\+\)~\1\t\2~g' dat.tab  

NA  NA  NA  NA  NA  NA
12  23  12  15  NA  NA
NA  NA  NA  NA  NA  NA

first substitution NAs fields beginning with '0/0' or '1/1'
second substitution isolates and emits the trailing colon separated numbers from the field

(did tidy up output spacing)

Upvotes: 0

anubhava
anubhava

Reputation: 785276

You may use this awk:

awk -v OFS='\t' -F '[:\t]' '{
   s = ""
   for (i=1; i<=NF; i+=3)
      s = (s == "" ? "" : s OFS) ($i == "0/0" || $i == "1/1" ? "NA" OFS "NA" : $(i+1) OFS $(i+2))
   print s
}' file

NA  NA  NA  NA  NA  NA
12  23  12  15  NA  NA
NA  NA  NA  NA  NA  NA

Upvotes: 1

tshiono
tshiono

Reputation: 22022

If I'm understanging your notation of TAB correctly, would you please try:

awk -F"\t" '{
    for (i = 1; i <= NF; i++) {
        split($i, a, ":")
        if (a[1] == "0/0" || a[1] == "1/1") a[2] = a[3] = "NA"
        printf "%s\t%s%s", a[2], a[3], i == NF ? "\n" : "\t"
    }
}' input_file

where input_file looks like:

1/1:123:121     0/0:1:21        1/1:12:14
0/1:12:23       0/1:12:15       0/0:123:16
0/0:3:178       1/1:123:121     1/1:2:28

and the output:

NA      NA      NA      NA      NA      NA
12      23      12      15      NA      NA
NA      NA      NA      NA      NA      NA

Upvotes: 1

George Pacheco
George Pacheco

Reputation: 39

One possible solution:

 awk '{ for(i=1; i<=NF; i++){split($i,a,","); if (a[1] == "0/0" || a[1] == "1/1") {printf " ""NA"" ""NA"} else {printf " "a[2]" "a[3]}} print""}' | cut -d " " -f2- > Test.txt

Upvotes: 0

Related Questions