misteralexander
misteralexander

Reputation: 488

egrep to match multiple lines

I have several Apache vHost configurations across several hosts. I'm trying to write a Bash script that will iterate through each host and search the .conf file on each one, pulling out the first (only the first) <VirtualHost> block. I've tried writing a regex to match it, but it's just not working. Here's the code I've tried:

    #!/bin/bash
    egrep -o '(\<VirtualHost\>)(.*)(\<\/VirtualHost\>)' -m1

Since .* doesn't match newlines, I even tried this:

    #!/bin/bash
    egrep -o '(\<VirtualHost\>)(.*[\S]*)(\<\/VirtualHost\>)' -m1

I still get nothing. :-(

I don't understand what I'm doing wrong here. Here is a sample of the data I'm trying to match:

    <VirtualHost apache-frontend:80>
            ServerAdmin     [email protected]
            ServerName      domain.com
            DocumentRoot    /path/to/my/doc/root

            RewriteEngine   On
            Include         include.d/global/rewrite.conf
            RewriteRule     ^(.*)$ http://www.domain.com$1 [R=301,L]
    </VirtualHost>

    <VirtualHost apache-frontend:80>
            ServerAdmin     [email protected]
            ServerName      domain.com
            DocumentRoot    /path/to/my/doc/root

            RewriteEngine   On
            Include         include.d/global/rewrite.conf
            RewriteRule     ^(.*)$ http://www.domain.com$1 [R=301,L]
    </VirtualHost>

    <VirtualHost apache-frontend:80>
            ServerAdmin     [email protected]
            ServerName      domain.com
            DocumentRoot    /path/to/my/doc/root

            RewriteEngine   On
            Include         include.d/global/rewrite.conf
            RewriteRule     ^(.*)$ http://www.domain.com$1 [R=301,L]
    </VirtualHost>

Upvotes: 2

Views: 3085

Answers (6)

scrat.squirrel
scrat.squirrel

Reputation: 3826

It is possible with grep as seen here.

Example finding all lines matching in some html file:

grep -Pazo "(?s)<div\s+class=\"version\">.*?Version\s+[\.0-9]+"

Upvotes: 0

Todd A. Jacobs
Todd A. Jacobs

Reputation: 84343

Use Perl

Perl is part of the Linux Standard Base, and is also standard on OS X, so it should be highly available on most modern systems. Perl is great at multiline text tasks. For example:

$ perl -ne '
      if (/VirtualHost/ ... m!/VirtualHost!) {
          print unless /VirtualHost/;
          exit if m!/VirtualHost!;
      }' /tmp/corpus

This one-liner will:

  1. Loop over the input file until it finds a VirtualHost block.
  2. Print every line within that block, excluding the starting or ending block tags.
  3. Exit the script when it sees the end of a VirtualHost block, ensuring that it only shows the first block.

Given your corpus, this will correctly yield:

           ServerAdmin     [email protected]
           ServerName      domain.com
           DocumentRoot    /path/to/my/doc/root

           RewriteEngine   On
           Include         include.d/global/rewrite.conf
           RewriteRule     ^(.*)$ http://www.domain.com$1 [R=301,L]

Upvotes: 0

Benjamin W.
Benjamin W.

Reputation: 52132

With GNU sed:

$ sed -n '/<VirtualHost/,/<\/VirtualHost>/{p;/<\/VirtualHost>/q}' infile
    <VirtualHost apache-frontend:80>
            ServerAdmin     [email protected]
            ServerName      domain.com
            DocumentRoot    /path/to/my/doc/root

            RewriteEngine   On
            Include         include.d/global/rewrite.conf
            RewriteRule     ^(.*)$ http://www.domain.com$1 [R=301,L]
    </VirtualHost>
  • -n prevents printing
  • /<VirtualHost/,/<\/VirtualHost>/ is an address range
  • For each line in the range, do {p;/<\/VirtualHost>/q}:
    • Print the line
    • If the line matches <\/VirtualHost>, i.e., is the last line of the block we want, then quit

To run this with BSD sed, add one more semicolon:

sed -n '/<VirtualHost/,/<\/VirtualHost>/{p;/<\/VirtualHost>/q;}'

Upvotes: 1

Saleem
Saleem

Reputation: 8978

There is no guarantee that every platform have a PCRE compatible grep available. You can write a custom script which is guarantee to work on anywhere where python is available.

import re, sys

rx = '(?<=\<VirtualHost).*?\r?\n(.*?)(?=</VirtualHost>)'

data = ''.join(sys.stdin.readlines())


match = re.search(rx, data, re.DOTALL)
if match:
    print(match.group(1))

You can use it as

cat  your_vhost_file | python search.py

Where search.py is python file containing script posted above. After execution of script, you'll have content of first block as:

        ServerAdmin     [email protected]
        ServerName      domain.com
        DocumentRoot    /path/to/my/doc/root

        RewriteEngine   On
        Include         include.d/global/rewrite.conf
        RewriteRule     ^(.*)$ http://www.domain.com$1 [R=301,L]

Note: This script can be easily adopted to list all matched sections in file.

Upvotes: 0

cifer
cifer

Reputation: 633

Actually you could use -B option to print the context of the matching line, like this:

grep -E '</VirtualHost>' -m1 -B8 *yours.conf*

Upvotes: 1

user2021201
user2021201

Reputation: 380

this oneliner pulls only the first VirtualHost block from a config file:

awk '/<VirtualHost/,/<\/VirtualHost>/{print $0} /<\/VirtualHost>/{exit}' < vhostconf

Upvotes: 2

Related Questions