adrotter
adrotter

Reputation: 311

AWK: Use First Line's Info to prepend each line

I have an awk 1 liner inside of a bash script. Need help with doing an awk line... Here's what I tried, but doesn't work.

here's my data.vcf

. abc hji ran kls
CHR1 0/0 0/0 0/1 0/0
CHR2 0/1 0/0 0/0 0/0
CHR3 1/1 0/0 0/0 0/0
CHR4 0/0 0/0 0/0 1/1

Code I have so far (Other bash code is unrelated):

awk '{ for (i=1; i<=NF; ++i) { if ($i ~ "/1") print NR==1 $i," ",$0} }' data.vcf

this prints:

 . abc hji ran kls
 0 CHR1 0/0 0/0 0/1 0/0
 0 CHR2 0/1 0/0 0/0 0/0
 0 CHR3 1/1 0/0 0/0 0/0
 0 CHR4 0/0 0/0 0/0 1/1

I would like it to print this:

 . abc hji ran kls
 ran CHR1 0/0 0/0 0/1 0/0
 abc CHR2 0/1 0/0 0/0 0/0
 abc CHR3 1/1 0/0 0/0 0/0
 kls CHR4 0/0 0/0 0/0 1/1

Basically, just prepend $i from the awk command with a space included, but using $i for the first line only. Thanks for your help.

Upvotes: 0

Views: 263

Answers (2)

Jotne
Jotne

Reputation: 41460

Here is an other variation:

awk 'NR==1 {for (i=2;i<=NF;i++) a[i]=$i;print;next} {for (i=2;i<=NF;i++) if ($i~"/1") $1=a[i]FS$1}1' file
. abc hji ran kls
ran CHR1 0/0 0/0 0/1 0/0
abc CHR2 0/1 0/0 0/0 0/0
abc CHR3 1/1 0/0 0/0 0/0
kls CHR4 0/0 0/0 0/0 1/1

How it works:

awk '
NR==1 {                     # For line "1"
    for (i=2;i<=NF;i++)     # Loop trough all elements
        a[i]=$i             # Store them in an array "a" using field location as reference
    print                   # Print the line
    next}                   # Do nothing more with line "1"
    {for (i=2;i<=NF;i++)    # Loop trough all fields of all other line
        if ($i~"/1")        # If field location contains "/1"
            $1=a[i]FS$1}    # Use that field number and get information from first line
1                           # Print all out
' file                      # Read the file

Upvotes: 1

paxdiablo
paxdiablo

Reputation: 882146

The following tesprog.awk script will give you what you want:

NR==1 {
    for (i = 2; i <= NF; i++) {
        txt[i] = $i;
    }
    print $0;
}
NR > 1 {
    pos = 0;
    for (i = 2; i <= NF; i++) {
        if ($i != "0/0") {
            pos = i;
        }
    }
    print txt[pos]" "$0;
}

It uses the first record to create an array of the header columns then, for all other records, it looks for the column that isn't 0/0 and stores the position.

It then uses that position to look up the text to prefix the line with.

The output from your given test data is:

pax> awk -f testprog.awk testprog.in
. abc hji ran kls
ran CHR1 0/0 0/0 0/1 0/0
abc CHR2 0/1 0/0 0/0 0/0
abc CHR3 1/1 0/0 0/0 0/0
kls CHR4 0/0 0/0 0/0 1/1

Now, there may be a bit of tweaking required if I haven't quite got the selection criteria right, if ($i != "0/0"), but that should be a fairly minimal change. It also selects the last matching column if there is more than one possible match so, if that's a possibility, you should specify the behaviour you want in that case.


To do this in a bash script rather than needing a separate awk script, just use:

awk '
    NR==1 {
        for (i = 2; i <= NF; i++) {
            txt[i] = $i;
        }
        print $0;
    }
    NR > 1 {
        pos = 0;
        for (i = 2; i <= NF; i++) {
            if ($i != "0/0") {
                pos = i;
            }
        }
        print txt[pos]" "$0;
    }' testprog.in

or, if you really want a one-liner, it'll be a long-ish line, not quite as readable as the fully expanded variant:

awk 'NR==1{for(i=2;i<=NF;i++){t[i]=$i}print $0}NR>1{p=0;for(i=2;i<=NF;i++){if($i!="0/0"){p=i}}print t[p]" "$0}' testprog.in

Upvotes: 3

Related Questions