Reputation: 311
I have an awk 1 liner inside of a bash script. Need help with doing an awk line... Here's what I tried, but doesn't work.
here's my data.vcf
. abc hji ran kls
CHR1 0/0 0/0 0/1 0/0
CHR2 0/1 0/0 0/0 0/0
CHR3 1/1 0/0 0/0 0/0
CHR4 0/0 0/0 0/0 1/1
Code I have so far (Other bash code is unrelated):
awk '{ for (i=1; i<=NF; ++i) { if ($i ~ "/1") print NR==1 $i," ",$0} }' data.vcf
this prints:
. abc hji ran kls
0 CHR1 0/0 0/0 0/1 0/0
0 CHR2 0/1 0/0 0/0 0/0
0 CHR3 1/1 0/0 0/0 0/0
0 CHR4 0/0 0/0 0/0 1/1
I would like it to print this:
. abc hji ran kls
ran CHR1 0/0 0/0 0/1 0/0
abc CHR2 0/1 0/0 0/0 0/0
abc CHR3 1/1 0/0 0/0 0/0
kls CHR4 0/0 0/0 0/0 1/1
Basically, just prepend $i from the awk command with a space included, but using $i for the first line only. Thanks for your help.
Upvotes: 0
Views: 263
Reputation: 41460
Here is an other variation:
awk 'NR==1 {for (i=2;i<=NF;i++) a[i]=$i;print;next} {for (i=2;i<=NF;i++) if ($i~"/1") $1=a[i]FS$1}1' file
. abc hji ran kls
ran CHR1 0/0 0/0 0/1 0/0
abc CHR2 0/1 0/0 0/0 0/0
abc CHR3 1/1 0/0 0/0 0/0
kls CHR4 0/0 0/0 0/0 1/1
How it works:
awk '
NR==1 { # For line "1"
for (i=2;i<=NF;i++) # Loop trough all elements
a[i]=$i # Store them in an array "a" using field location as reference
print # Print the line
next} # Do nothing more with line "1"
{for (i=2;i<=NF;i++) # Loop trough all fields of all other line
if ($i~"/1") # If field location contains "/1"
$1=a[i]FS$1} # Use that field number and get information from first line
1 # Print all out
' file # Read the file
Upvotes: 1
Reputation: 882146
The following tesprog.awk
script will give you what you want:
NR==1 {
for (i = 2; i <= NF; i++) {
txt[i] = $i;
}
print $0;
}
NR > 1 {
pos = 0;
for (i = 2; i <= NF; i++) {
if ($i != "0/0") {
pos = i;
}
}
print txt[pos]" "$0;
}
It uses the first record to create an array of the header columns then, for all other records, it looks for the column that isn't 0/0
and stores the position.
It then uses that position to look up the text to prefix the line with.
The output from your given test data is:
pax> awk -f testprog.awk testprog.in
. abc hji ran kls
ran CHR1 0/0 0/0 0/1 0/0
abc CHR2 0/1 0/0 0/0 0/0
abc CHR3 1/1 0/0 0/0 0/0
kls CHR4 0/0 0/0 0/0 1/1
Now, there may be a bit of tweaking required if I haven't quite got the selection criteria right, if ($i != "0/0")
, but that should be a fairly minimal change. It also selects the last matching column if there is more than one possible match so, if that's a possibility, you should specify the behaviour you want in that case.
To do this in a bash
script rather than needing a separate awk
script, just use:
awk '
NR==1 {
for (i = 2; i <= NF; i++) {
txt[i] = $i;
}
print $0;
}
NR > 1 {
pos = 0;
for (i = 2; i <= NF; i++) {
if ($i != "0/0") {
pos = i;
}
}
print txt[pos]" "$0;
}' testprog.in
or, if you really want a one-liner, it'll be a long-ish line, not quite as readable as the fully expanded variant:
awk 'NR==1{for(i=2;i<=NF;i++){t[i]=$i}print $0}NR>1{p=0;for(i=2;i<=NF;i++){if($i!="0/0"){p=i}}print t[p]" "$0}' testprog.in
Upvotes: 3