Reputation: 27

Find smallest value in column for each block using awk

I would like to find the smallest value of column $3, for each block which has a "header line" in the format of '@ value1 value2'. The file looks like this

@ 62.65 -50.35
0 1.50 1.676
1.67 1.50 1.677
1.67 2.25 1.423
2.90 2.25 2.902
2.90 4.95 2.903
3.04 4.95 3.049
@ 63.61 -50.45
0 1.50 1.654
3.42 1.50 1.875
3.43 2.19 3.430
5.31 2.19 1.032
5.32 6.23 5.320
5.43 6.23 5.434

After this, I want to print the header as well as the whole line with the smallest $3.

So, my output file should look like

@ 62.65 -50.35
1.67 2.25 1.423
@ 63.61 -50.45
5.31 2.19 1.032

This is what I tried so far:

awk '{if (!/^@/) {if ($3 < min) {$3=min;} print $1, $2, $3, min; min = $3;} else if (/^@/) min = 10; print $1 $2 $3;}' input.txt > output.txt

I have troubles to separate the blocks and to "reset" min (I tried to set the 'start' value high - i.e. 10).

I am new in programming and mainly used awk so far - if you could help me out, that would be really great! Thanks so much! Cheers, Isi

Upvotes: 2

Answers (4)

paxdiablo

Reputation: 881303

Awk can do this quite easily, as per the following script:

awk '
    $1=="@"    { first=1; key=$0; next }
    first==1   { lowest=$3; line[key]=$0; first=0; next }
               { if ($3 < lowest) { lowest=$3; line[key]=$0 } }
    END        { for (key in line) { printf "%s\n%s\n", key, line[key] } }
' <<EOF
@ 62.65 -50.35
0 1.50 1.676
1.67 1.50 1.677
1.67 2.25 1.423
2.90 2.25 2.902
2.90 4.95 2.903
3.04 4.95 3.049
@ 63.61 -50.45
0 1.50 1.654
3.42 1.50 1.875
3.43 2.19 3.430
5.31 2.19 1.032
5.32 6.23 5.320
5.43 6.23 5.434
EOF

As requested, the output is:

@ 62.65 -50.35
1.67 2.25 1.423
@ 63.61 -50.45
5.31 2.19 1.032

Breaking out the code for clarification:

$1=="@" {                # For all header lines:
    first=1              #    Flag that you're starting a new block.
    key=$0               #    Save the key.
    next                 #    Go back for next line.
}
first==1 {               # For first line in each block:
    lowest=$3            #    It must be lowest.
    line[key]=$0         #    Store line.
    first=0              #    Now processing subsequent lines in block.
    next                 # Go back for next line.
}
{                        # For non-first-lines-in-block:
    if ($3 < lowest) {   #    Only if this one is lower.
        lowest=$3        #    Store value and line.
        line[key]=$0
    }
}
END {                    # At end, simply output associative array.
    for (key in line) {
        printf "%s\n%s\n", key, line[key]
    }
}

Keep in mind this assumes the header lines are unique. If there can be duplicates and you want to treat them distinctly, you can create the key from a combination of NR and $0.

Upvotes: 1

ctac_

Reputation: 2471

The following is yet another solution using awk command.

awk '
/^@/{if(b)print a;print $0;next}
!b||$3<b{b=$3;a=$0}
END{print a}
' infile

/^@/{if(b)print a;print $0;next}

for the line which start with @

if b is defined print a (on the first line a and b are not defined)

print the line and go to next line

!b||$3<b{b=$3;a=$0}

for each line whithout the line starting by @

if b is not defined or if $3 is lesser than b, keep $3 in b and keep the line in a

END{print a}

at the end, we must print the line stored in a

Upvotes: 0

ghoti

Reputation: 46836

Here's a totally different awk approach.

awk 'BEGIN {RS="@"} {s=$3 OFS $4 OFS $5; n=$5; for (i=5;i<=NF;i+=3) {if ($i<n) {s=$(i-2) OFS $(i-1) OFS $i; n=$5} }} n { print "@ " $1 OFS $2 ORS s }' infile

Or broken out for easier commenting:

BEGIN {
  RS="@"                          # "@" as our record separator
}

{
  s=$3 OFS $4 OFS $5              # store the first line...
  n=$5
  for (i=5;i<=NF;i+=3) {          # for each 3rd field on a line,
    if ($i<n) {                   # test its value and
      s=$(i-2) OFS $(i-1) OFS $i  # store a new value if the
      n=$5                        # condition matches
    }
  }
}

n {
  print "@ " $1 OFS $2 ORS s      # print records once we have them.
}

Upvotes: 1

Akshay Hegde

Reputation: 16997

Using awk:

awk '/^@/{if(h){print h RS m}min=""; h=$0; next}min=="" || $3 < min{min=$3; m=$0}END{print h RS m}' infile

Using array

awk '/^@/{h=$0;min="";next}min==""||$3<min{min=$3;l[h]=$0}END{for(i in l)print i RS l[i]}' infile

Better Readable:

awk '/^@/{
          if(h){
               print h RS m
          }
          min=""; h=$0; next
         }
     min=="" || $3 < min{
          min=$3; 
          m=$0
     }
     END{
          print h RS m
     }
    ' infile

Using array

awk '/^@/{
          h=$0;min="";
          next
     }
     min==""||$3<min{
          min=$3;
          l[h]=$0
     }
     END{
          for(i in l)
               print i RS l[i]
     }
     ' infile

Test Results:

$ cat infile
@ 62.65 -50.35
0 1.50 1.676
1.67 1.50 1.677
1.67 2.25 1.423
2.90 2.25 2.902
2.90 4.95 2.903
3.04 4.95 3.049
@ 63.61 -50.45
0 1.50 1.654
3.42 1.50 1.875
3.43 2.19 3.430
5.31 2.19 1.032
5.32 6.23 5.320
5.43 6.23 5.434

Output-1 ( Recommended )

$ awk '/^@/{if(h){print h RS m}min=""; h=$0; next}min=="" || $3 < min{min=$3; m=$0}END{print h RS m}' infile
@ 62.65 -50.35
1.67 2.25 1.423
@ 63.61 -50.45
5.31 2.19 1.032

Output-2

$ awk '/^@/{h=$0;min="";next}min==""||$3<min{min=$3;l[h]=$0}END{for(i in l)print i RS l[i]}' infile
@ 62.65 -50.35
1.67 2.25 1.423
@ 63.61 -50.45
5.31 2.19 1.032

Upvotes: 3

Find smallest value in column for each block using awk

Answers (4)

Related Questions