Sherzad
Sherzad

Reputation: 435

Extract content between two patterns from a file

I want to extract the following content between The SUMMARY and End processing summary for first university as well as the The SUMMARY and End processing summary for second university

This is my file:

Logs here
...
...
...
More logs here
...
...

The SUMMARY
Total students: 1200
Total teachers: 10
Total subjects: 20
Total attendance: 12000
End processing summary for first university

Logs here
...
...
...
More logs here
...
...

The SUMMARY
Total students: 1500
Total teachers: 12
Total subjects: 15
Total attendance: 20000
End processing summary for second university

Logs here
...
...
...
More logs here
...
...

The following works great:

firstUniversity=$(awk '/The SUMMARY/ && ++n == 1, /End processing summary for first university/' < theLog.log)

secondUniversity=$(awk '/The SUMMARY/ && ++n == 2, /End processing summary for second university/' < theLog.log)

However, sometimes either the summary for first university or the summary for second university is missing and the above code does not work.

First university block is missing

Logs here
...
...
...
More logs here
...
...
Logs here
...
...
...
More logs here
...
...

The SUMMARY
Total students: 1500
Total teachers: 12
Total subjects: 15
Total attendance: 20000
End processing summary for second university

Logs here
...
...
...
More logs here
...
...

Or the second university block is missing

Logs here
...
...
...
More logs here
...
...

The SUMMARY
Total students: 1200
Total teachers: 10
Total subjects: 20
Total attendance: 12000
End processing summary for first university

Logs here
...
...
...
More logs here

Any solution using either the sed or the awk commands?

Upvotes: 0

Views: 69

Answers (2)

anubhava
anubhava

Reputation: 784898

A slightly different awk approach:

cat extract.awk

/The SUMMARY/ {                                       # match starting line
  s = $0                                              # set s to current line
  p = 1                                               # set flag p to 1
}
p {                                                   # if flag p is set
   s = s ORS $0                                       # keep adding lines to s
}
$0 ~ "End processing summary for " kw " university" { # when we find end line
   print s                                            # print full text
   p = 0                                              # reset p to 0
}

Then use it as:

firstUniversity="$(awk -v kw='first' -f extract.awk inputFile)"
secondUniversity="$(awk -v kw='second' -f extract.awk inputFile)"

Without using a awk script file:

firstUniversity="$(awk -v kw='first' '/The SUMMARY/{s=$0; p=1} p{s = s ORS $0}
$0 ~ "End processing summary for " kw " university"{print s; p=0}' inputFile)"

secondUniversity="$(awk -v kw='second' '/The SUMMARY/{s=$0; p=1} p{s = s ORS $0}
$0 ~ "End processing summary for " kw " university"{print s; p=0}' inputFile)"

Upvotes: 1

RavinderSingh13
RavinderSingh13

Reputation: 133428

Could you please try following, written and tested with shown samples. In this solution for clarity I had created found_university variable.

awk '
/The SUMMARY/{
   found_summary=1
   val=found_university=""
}
/End processing summary for first university|End processing summary for second university/{
   found_university=1
   if(found_university && found_summary){
     print val ORS $0
   }
   val=found_university=found_summary=""
}
found_summary{
   val=(val?val ORS:"")$0
}
'  Input_file

One could try following which doesn't use variable found_university and simply checks condition for university string occurrence.

awk '
/The SUMMARY/{
   found_summary=1
   val=""
}
/End processing summary for first university|End processing summary for second university/{
   if(found_summary){
     print val ORS $0
   }
   val=found_summary=""
}
found_summary{
   val=(val?val ORS:"")$0
}
'   Input_file

Explanation: Adding detailed level explanation for above code. Please scroll little right to see explanation :)

awk '                                                                                             ##Starting awk program from here.
/The SUMMARY/{                                                                                    ##Checking condition if line has string The SUMMARY then do following.
   found_summary=1                                                                                ##Setting found_summary as 1 here.
   val=""                                                                                         ##Nullifying variable val here.
}
/End processing summary for first university|End processing summary for second university/{       ##Checking condition if university string present in line then do following.
   if(found_summary){                                                                             ##Checking condition if found_summary is SET then do following.
     print val ORS $0                                                                             ##Printing variable val ORS and current line here.
   }
   val=found_summary=""                                                                           ##Nullifying variables val and found_summary here.
}
found_summary{                                                                                    ##Checking condition if found_summary is SET then do following.
   val=(val?val ORS:"")$0                                                                         ##Keep concatenating current line in val value.
}
'  Input_file                                                                                       ##Mentioning Input_file name here.

Upvotes: 1

Related Questions