Reputation: 15
I have code block similar to this
<service id="http-upgrade-service" class="HTTPUpgrade">
<maxHeaderSize>65536</maxHeaderSize>
When I try to grep or awk for this pattern it doesn't return this line. There is another part in the file that had a headerSize parameter which is also causing issues
These are some of the options I have tried
awk '/<service id="http-upgrade-service" class="HTTPUpgrade"/ ,/<maxHeaderSize>65536</maxHeaderSize>/' file
grep -n -E '<service id="http-upgrade-service" class="HTTPUpgrade">*\n<maxHeaderSize>65536<\/maxHeaderSize>' head -n 1 file
grep -e '<service id="http-upgrade-service" class="HTTPUpgrade"> -e `<maxHeaderSize>65536</maxHeaderSize>' file
grep -Pzl '(?s)<service id="http-upgrade-service" class="HTTPUpgrade">*\n.<maxHeaderSize>65536</maxHeaderSize>' file
grep -oP '(?<=<service id="http-upgrade-service" class="HTTPUpgrade"> )\w+(?=<maxHeaderSize>65536</maxHeaderSize>)'
awk '/<service id="http-upgrade-service" class="HTTPUpgrade">/ ,/<maxHeaderSize>65536</maxHeaderSize>/ {print}' file
Trying to match this pattern which includes both line values.
Upvotes: 0
Views: 146
Reputation: 203655
With GNU grep for -z
, -o
, and \s
shorthand for [[:space:]]
:
$ grep -zo '<service id="http-upgrade-service" class="HTTPUpgrade">\s*<maxHeaderSize>65536</maxHeaderSize>' file
<service id="http-upgrade-service" class="HTTPUpgrade">
<maxHeaderSize>65536</maxHeaderSize>
You didn't show the expected output in your question so I'm guessing you wanted the matching string and that you can massage to suit if that's not what you wanted.
You can use the same regexp in GNU sed -z
or GNU awk -v RS='^$'
, all of which read the whole file into memory at once just like GNU grep -z
:
$ sed -Ez 's:.*(<service id="http-upgrade-service" class="HTTPUpgrade">\s*<maxHeaderSize>65536</maxHeaderSize>).*:\1:' file
<service id="http-upgrade-service" class="HTTPUpgrade">
<maxHeaderSize>65536</maxHeaderSize>
$ awk -v RS='^$' 'match($0,/.*(<service id="http-upgrade-service" class="HTTPUpgrade">\s*<maxHeaderSize>65536<\/maxHeaderSize>).*/,a){print a[1]}' file
<service id="http-upgrade-service" class="HTTPUpgrade">
<maxHeaderSize>65536</maxHeaderSize>
or use any POSIX awk in paragraph mode since there's no blank lines within the block you're trying to match:
$ awk -v RS='' 'match($0,/<service id="http-upgrade-service" class="HTTPUpgrade">[[:space:]]*<maxHeaderSize>65536<\/maxHeaderSize>/){print substr($0,RSTART,RLENGTH)}' file
<service id="http-upgrade-service" class="HTTPUpgrade">
<maxHeaderSize>65536</maxHeaderSize>
and if you don't have a POSIX awk replace [:space:]
with [ \t\n]
and then the above will work in any awk assuming you don't have any of the additional space chars carriage return, formfeed and vertical tab in your input (and if you do, add them to the list in the bracket expression).
Upvotes: 1