caffein
caffein

Reputation: 627

How can I print the line number of a record in awk?

I am using awk to process multi-line records, where the number of fields per record is unknown. This is to help filter records in a very large file, so it would be helpful to know the line number of the returned records. I tried incriminating a variable for each record, but that seems hacky, is there a better way to do this?

Data example (with line numbers included):

1 | data1 - good
2 |    foo bar
3 |
4 | data2 - bad
5 |    foo bar
6 |    pet cat
7 |    name snuggles
8 |
9 | data3 - good
10|    foo bar
11|    color blue

Code Example:

BEGIN {RS =""; FS="\n"; ORS="\n\n"; OFS=""; x=0}
{
  { x += NF + 1; }
  { if ($1 ~ /bad/) { next; } }
  { print "[", x - NF, "]\n", $0; }
}

The output I'm looking for would be something like this:

[1]
data1 - good
    foo bar

[9]
data3 - good
    foo bar
    color blue

Is there a better way to do this that I'm not seeing?

Upvotes: 3

Views: 1810

Answers (4)

stack0114106
stack0114106

Reputation: 8791

If Perl is an option, you could try below

$ cat caffein.txt
data1 - good
   foo bar

data2 - bad
   foo bar
   pet cat
   name snuggles

data3 - good
   foo bar
   color blue

$ perl -0777 -ne ' s/^/++$x." "/mge; while(/(^\d+)(\s*data.+?good.+?)(\n\d+\s+\n\d+\s+|\Z)/gms) { $x="[$1] $2\n\n";$x=~s/^\d+/ /mg; print $x } ' caffein.txt
[1]  data1 - good
     foo bar

[9]  data3 - good
     foo bar
     color blue


$

or with negative lookahead for not-matching "bad"

$ perl -0777 -ne ' s/^/++$x." "/mge; while(/(^\d+)(\s*data.+?(?!bad).+?)(\n\d+\s+\n\d+\s+|\Z)/gms) { $x="[$1] $2\n\n";$x=~s/^\d+/ /mg; print $x } ' caffein.txt

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 204548

Your approach doesn't seem bad though I might tweak it to:

$ cat tst.awk
BEGIN { RS=""; ORS="\n\n"; FS="\n" }
{
    nr += prevNf + 1
    if ($1 ~ /good/) {
        print "[" nr "]\n" $0
    }
    prevNf = NF
}

$ awk -f tst.awk file
[1]
data1 - good
   foo bar

[9]
data3 - good
   foo bar
   color blue

but here's an alternative:

$ cat tst.awk
!NF { prt(); next }
{
    nrs[++numLines] = NR
    rec[numLines]   = $0
}
END { prt() }

function prt(   lineNr) {
    if (rec[1] ~ /good/) {
        printf "[%d]\n", nrs[1]
        for (lineNr=1; lineNr<=numLines; lineNr++) {
            print rec[lineNr]
        }
        print ""
    }
    delete rec
    numLines = 0
}

$ awk -f tst.awk file
[1]
data1 - good
   foo bar

[9]
data3 - good
   foo bar
   color blue

With the above you can do more than just test for good or bad on just one line and you can print the input line number for all or any lines of each record if you like.

Upvotes: 1

RavinderSingh13
RavinderSingh13

Reputation: 133760

Could you please try following once, tested with your samples only.

awk '
/data[0-9]+/{
  flag=$NF=="bad"?"":1
  count=""
}
flag && NF>2{
  if(++count==1){
    print "["$1"]"
    sub(/.*\| /,"")
  }
  sub(/.*\|/,"")
  print
}'   Input_file

Upvotes: 1

jas
jas

Reputation: 10865

In general I think your approach is fine and wouldn't consider it hackey.

You might consider some minor tweaks to make it a tiny bit simpler:

BEGIN {RS =""; FS="\n"; ORS="\n\n"; OFS=""; x=1}
!($1 ~ /bad/) { print "[", x, "]\n", $0; }
{ x += NF + 1; }

Upvotes: 2

Related Questions