Reputation: 27
I would like to find the smallest value of column $3, for each block which has a "header line" in the format of '@ value1 value2'. The file looks like this
@ 62.65 -50.35
0 1.50 1.676
1.67 1.50 1.677
1.67 2.25 1.423
2.90 2.25 2.902
2.90 4.95 2.903
3.04 4.95 3.049
@ 63.61 -50.45
0 1.50 1.654
3.42 1.50 1.875
3.43 2.19 3.430
5.31 2.19 1.032
5.32 6.23 5.320
5.43 6.23 5.434
After this, I want to print the header as well as the whole line with the smallest $3.
So, my output file should look like
@ 62.65 -50.35
1.67 2.25 1.423
@ 63.61 -50.45
5.31 2.19 1.032
This is what I tried so far:
awk '{if (!/^@/) {if ($3 < min) {$3=min;} print $1, $2, $3, min; min = $3;} else if (/^@/) min = 10; print $1 $2 $3;}' input.txt > output.txt
I have troubles to separate the blocks and to "reset" min (I tried to set the 'start' value high - i.e. 10).
I am new in programming and mainly used awk so far - if you could help me out, that would be really great! Thanks so much! Cheers, Isi
Upvotes: 2
Views: 356
Reputation: 881303
Awk can do this quite easily, as per the following script:
awk '
$1=="@" { first=1; key=$0; next }
first==1 { lowest=$3; line[key]=$0; first=0; next }
{ if ($3 < lowest) { lowest=$3; line[key]=$0 } }
END { for (key in line) { printf "%s\n%s\n", key, line[key] } }
' <<EOF
@ 62.65 -50.35
0 1.50 1.676
1.67 1.50 1.677
1.67 2.25 1.423
2.90 2.25 2.902
2.90 4.95 2.903
3.04 4.95 3.049
@ 63.61 -50.45
0 1.50 1.654
3.42 1.50 1.875
3.43 2.19 3.430
5.31 2.19 1.032
5.32 6.23 5.320
5.43 6.23 5.434
EOF
As requested, the output is:
@ 62.65 -50.35
1.67 2.25 1.423
@ 63.61 -50.45
5.31 2.19 1.032
Breaking out the code for clarification:
$1=="@" { # For all header lines:
first=1 # Flag that you're starting a new block.
key=$0 # Save the key.
next # Go back for next line.
}
first==1 { # For first line in each block:
lowest=$3 # It must be lowest.
line[key]=$0 # Store line.
first=0 # Now processing subsequent lines in block.
next # Go back for next line.
}
{ # For non-first-lines-in-block:
if ($3 < lowest) { # Only if this one is lower.
lowest=$3 # Store value and line.
line[key]=$0
}
}
END { # At end, simply output associative array.
for (key in line) {
printf "%s\n%s\n", key, line[key]
}
}
Keep in mind this assumes the header lines are unique. If there can be duplicates and you want to treat them distinctly, you can create the key from a combination of NR
and $0
.
Upvotes: 1
Reputation: 2471
The following is yet another solution using awk command.
awk '
/^@/{if(b)print a;print $0;next}
!b||$3<b{b=$3;a=$0}
END{print a}
' infile
/^@/{if(b)print a;print $0;next}
for the line which start with @
if b is defined print a (on the first line a and b are not defined)
print the line and go to next line
!b||$3<b{b=$3;a=$0}
for each line whithout the line starting by @
if b is not defined or if $3 is lesser than b, keep $3 in b and keep the line in a
END{print a}
at the end, we must print the line stored in a
Upvotes: 0
Reputation: 46836
Here's a totally different awk approach.
awk 'BEGIN {RS="@"} {s=$3 OFS $4 OFS $5; n=$5; for (i=5;i<=NF;i+=3) {if ($i<n) {s=$(i-2) OFS $(i-1) OFS $i; n=$5} }} n { print "@ " $1 OFS $2 ORS s }' infile
Or broken out for easier commenting:
BEGIN {
RS="@" # "@" as our record separator
}
{
s=$3 OFS $4 OFS $5 # store the first line...
n=$5
for (i=5;i<=NF;i+=3) { # for each 3rd field on a line,
if ($i<n) { # test its value and
s=$(i-2) OFS $(i-1) OFS $i # store a new value if the
n=$5 # condition matches
}
}
}
n {
print "@ " $1 OFS $2 ORS s # print records once we have them.
}
Upvotes: 1
Reputation: 16997
Using awk:
awk '/^@/{if(h){print h RS m}min=""; h=$0; next}min=="" || $3 < min{min=$3; m=$0}END{print h RS m}' infile
Using array
awk '/^@/{h=$0;min="";next}min==""||$3<min{min=$3;l[h]=$0}END{for(i in l)print i RS l[i]}' infile
Better Readable:
awk '/^@/{
if(h){
print h RS m
}
min=""; h=$0; next
}
min=="" || $3 < min{
min=$3;
m=$0
}
END{
print h RS m
}
' infile
Using array
awk '/^@/{
h=$0;min="";
next
}
min==""||$3<min{
min=$3;
l[h]=$0
}
END{
for(i in l)
print i RS l[i]
}
' infile
Test Results:
$ cat infile
@ 62.65 -50.35
0 1.50 1.676
1.67 1.50 1.677
1.67 2.25 1.423
2.90 2.25 2.902
2.90 4.95 2.903
3.04 4.95 3.049
@ 63.61 -50.45
0 1.50 1.654
3.42 1.50 1.875
3.43 2.19 3.430
5.31 2.19 1.032
5.32 6.23 5.320
5.43 6.23 5.434
Output-1 ( Recommended )
$ awk '/^@/{if(h){print h RS m}min=""; h=$0; next}min=="" || $3 < min{min=$3; m=$0}END{print h RS m}' infile
@ 62.65 -50.35
1.67 2.25 1.423
@ 63.61 -50.45
5.31 2.19 1.032
Output-2
$ awk '/^@/{h=$0;min="";next}min==""||$3<min{min=$3;l[h]=$0}END{for(i in l)print i RS l[i]}' infile
@ 62.65 -50.35
1.67 2.25 1.423
@ 63.61 -50.45
5.31 2.19 1.032
Upvotes: 3