mjd
mjd

Reputation: 15

Use awk for grep to find a pattern with a line break

I have code block similar to this

           <service id="http-upgrade-service" class="HTTPUpgrade">
 <maxHeaderSize>65536</maxHeaderSize>

When I try to grep or awk for this pattern it doesn't return this line. There is another part in the file that had a headerSize parameter which is also causing issues

These are some of the options I have tried

awk '/<service id="http-upgrade-service" class="HTTPUpgrade"/ ,/<maxHeaderSize>65536</maxHeaderSize>/' file
    
    grep -n -E '<service id="http-upgrade-service" class="HTTPUpgrade">*\n<maxHeaderSize>65536<\/maxHeaderSize>' head -n 1 file

grep -e '<service id="http-upgrade-service" class="HTTPUpgrade"> -e `<maxHeaderSize>65536</maxHeaderSize>' file
    
    grep -Pzl '(?s)<service id="http-upgrade-service" class="HTTPUpgrade">*\n.<maxHeaderSize>65536</maxHeaderSize>' file
    
    grep -oP '(?<=<service id="http-upgrade-service" class="HTTPUpgrade"> )\w+(?=<maxHeaderSize>65536</maxHeaderSize>)'
    
    awk '/<service id="http-upgrade-service" class="HTTPUpgrade">/ ,/<maxHeaderSize>65536</maxHeaderSize>/ {print}' file

Trying to match this pattern which includes both line values.

Upvotes: 0

Views: 146

Answers (1)

Ed Morton
Ed Morton

Reputation: 203655

With GNU grep for -z, -o, and \s shorthand for [[:space:]]:

$ grep -zo '<service id="http-upgrade-service" class="HTTPUpgrade">\s*<maxHeaderSize>65536</maxHeaderSize>' file
<service id="http-upgrade-service" class="HTTPUpgrade">
 <maxHeaderSize>65536</maxHeaderSize>

You didn't show the expected output in your question so I'm guessing you wanted the matching string and that you can massage to suit if that's not what you wanted.

You can use the same regexp in GNU sed -z or GNU awk -v RS='^$', all of which read the whole file into memory at once just like GNU grep -z:

$ sed -Ez 's:.*(<service id="http-upgrade-service" class="HTTPUpgrade">\s*<maxHeaderSize>65536</maxHeaderSize>).*:\1:' file
<service id="http-upgrade-service" class="HTTPUpgrade">
 <maxHeaderSize>65536</maxHeaderSize>

$ awk -v RS='^$' 'match($0,/.*(<service id="http-upgrade-service" class="HTTPUpgrade">\s*<maxHeaderSize>65536<\/maxHeaderSize>).*/,a){print a[1]}' file
<service id="http-upgrade-service" class="HTTPUpgrade">
 <maxHeaderSize>65536</maxHeaderSize>

or use any POSIX awk in paragraph mode since there's no blank lines within the block you're trying to match:

$ awk -v RS='' 'match($0,/<service id="http-upgrade-service" class="HTTPUpgrade">[[:space:]]*<maxHeaderSize>65536<\/maxHeaderSize>/){print substr($0,RSTART,RLENGTH)}' file
<service id="http-upgrade-service" class="HTTPUpgrade">
 <maxHeaderSize>65536</maxHeaderSize>

and if you don't have a POSIX awk replace [:space:] with [ \t\n] and then the above will work in any awk assuming you don't have any of the additional space chars carriage return, formfeed and vertical tab in your input (and if you do, add them to the list in the bracket expression).

Upvotes: 1

Related Questions