Aaron Fi
Aaron Fi

Reputation: 10396

In sed or awk, how do I handle record separators which *may* span multiple lines?

My log file is:

 Wed Nov 12 blah blah blah blah cat1
 Wed Nov 12 blah blah blah blah
 Wed Nov 12 blah blah blah blah 
 Wed Nov 12 blah blah blah blah cat2
     more blah blah
     even more blah blah
 Wed Nov 12 blah blah blah blah cat3
 Wed Nov 12 blah blah blah blah cat4

I want to parse out the full multiline entries where cat is found on the first line. What's the best way to do this in sed and/or awk?

i.e. i want my parse to produce:

 Wed Nov 12 blah blah blah blah cat1
 Wed Nov 12 blah blah blah blah cat2
     more blah blah
     even more blah blah
 Wed Nov 12 blah blah blah blah cat3
 Wed Nov 12 blah blah blah blah cat4

Upvotes: 3

Views: 1663

Answers (4)

user918938
user918938

Reputation:

Another approach would be to set RS to be something other than the normal \n. For example:

$ awk -v Pre=Wed 'BEGIN {RS = "\\n?\\s*" Pre} /cat.\n?/ {print Pre $0}' file.log
Wed Nov 12 blah blah blah blah cat1
Wed Nov 12 blah blah blah blah cat2
     more blah blah
     even more blah blah
Wed Nov 12 blah blah blah blah cat3
Wed Nov 12 blah blah blah blah cat4

Upvotes: 0

flolo
flolo

Reputation: 15526

if you say every line that starts with space is a continuation of the folling its easy with (g)awk (this is from my memory, so maybe it contains some minor typos, and for better readability with some additional linebreaks):

awk " BEGIN { multiline = 0;} 
      ! /^ / { if (whatever) 
                 { print; multiline = 1;} 
               else 
                 multiline = 0; 
             } 
        /^ / {if (multiline == 1) 
                 print;
             } 
     " 
      yourfile

where whatever is your check if your output should happen (e.g. for the cat).

Upvotes: 1

activout.se
activout.se

Reputation: 6116

Something like this?

awk 'function print_part() { if(cat) print part }  /^  / { part = part "\n" $0; next } /cat[0-9]$/ { print_part(); part = $0; cat = 1; next;  } { print_part(); cat=0} END { print_part() }' inputfile

The /^ / regexp identifies continuation lines.

The /cat[0-9]$/ regexp identifies the starter lines you want to keep.

Upvotes: 0

Adam Rosenfield
Adam Rosenfield

Reputation: 400502

Assuming your log file does not contain the control characters '\01' and '\02', and that a continued line begins with precisely four spaces, the following might work:

c1=`echo -en '\01'`
c2=`echo -en '\02'`
cat logfile | tr '\n' $c1 | sed "s/$c1    /$c2/g" | sed "s/$c1/\n/g" | grep cat | sed "s/$c2/\n    /g"

Explanation: this replaces each newline with ASCII 1 (a control character that should never appear in a log file) and each sequence "newline-space-space-space-space" with ASCII 2 (another control character). It then re-replaces ASCII 1 with newlines, so now each sequence of multiple lines is put into one line, with the old line breaks replaced by ASCII 2. This is grepped for cat, and then the ASCII 2's are re-replaced with the newline-space-space-space-space combination.

Upvotes: 1

Related Questions