Reputation: 627
I am using awk to process multi-line records, where the number of fields per record is unknown. This is to help filter records in a very large file, so it would be helpful to know the line number of the returned records. I tried incriminating a variable for each record, but that seems hacky, is there a better way to do this?
Data example (with line numbers included):
1 | data1 - good
2 | foo bar
3 |
4 | data2 - bad
5 | foo bar
6 | pet cat
7 | name snuggles
8 |
9 | data3 - good
10| foo bar
11| color blue
Code Example:
BEGIN {RS =""; FS="\n"; ORS="\n\n"; OFS=""; x=0}
{
{ x += NF + 1; }
{ if ($1 ~ /bad/) { next; } }
{ print "[", x - NF, "]\n", $0; }
}
The output I'm looking for would be something like this:
[1]
data1 - good
foo bar
[9]
data3 - good
foo bar
color blue
Is there a better way to do this that I'm not seeing?
Upvotes: 3
Views: 1810
Reputation: 8791
If Perl is an option, you could try below
$ cat caffein.txt
data1 - good
foo bar
data2 - bad
foo bar
pet cat
name snuggles
data3 - good
foo bar
color blue
$ perl -0777 -ne ' s/^/++$x." "/mge; while(/(^\d+)(\s*data.+?good.+?)(\n\d+\s+\n\d+\s+|\Z)/gms) { $x="[$1] $2\n\n";$x=~s/^\d+/ /mg; print $x } ' caffein.txt
[1] data1 - good
foo bar
[9] data3 - good
foo bar
color blue
$
or with negative lookahead for not-matching "bad"
$ perl -0777 -ne ' s/^/++$x." "/mge; while(/(^\d+)(\s*data.+?(?!bad).+?)(\n\d+\s+\n\d+\s+|\Z)/gms) { $x="[$1] $2\n\n";$x=~s/^\d+/ /mg; print $x } ' caffein.txt
Upvotes: 0
Reputation: 204548
Your approach doesn't seem bad though I might tweak it to:
$ cat tst.awk
BEGIN { RS=""; ORS="\n\n"; FS="\n" }
{
nr += prevNf + 1
if ($1 ~ /good/) {
print "[" nr "]\n" $0
}
prevNf = NF
}
$ awk -f tst.awk file
[1]
data1 - good
foo bar
[9]
data3 - good
foo bar
color blue
but here's an alternative:
$ cat tst.awk
!NF { prt(); next }
{
nrs[++numLines] = NR
rec[numLines] = $0
}
END { prt() }
function prt( lineNr) {
if (rec[1] ~ /good/) {
printf "[%d]\n", nrs[1]
for (lineNr=1; lineNr<=numLines; lineNr++) {
print rec[lineNr]
}
print ""
}
delete rec
numLines = 0
}
$ awk -f tst.awk file
[1]
data1 - good
foo bar
[9]
data3 - good
foo bar
color blue
With the above you can do more than just test for good or bad on just one line and you can print the input line number for all or any lines of each record if you like.
Upvotes: 1
Reputation: 133760
Could you please try following once, tested with your samples only.
awk '
/data[0-9]+/{
flag=$NF=="bad"?"":1
count=""
}
flag && NF>2{
if(++count==1){
print "["$1"]"
sub(/.*\| /,"")
}
sub(/.*\|/,"")
print
}' Input_file
Upvotes: 1
Reputation: 10865
In general I think your approach is fine and wouldn't consider it hackey.
You might consider some minor tweaks to make it a tiny bit simpler:
BEGIN {RS =""; FS="\n"; ORS="\n\n"; OFS=""; x=1}
!($1 ~ /bad/) { print "[", x, "]\n", $0; }
{ x += NF + 1; }
Upvotes: 2