twilson
twilson

Reputation: 2062

grep output all capture groups, evening without a match

I am writing a bash script and I'm using grep with a regular expression for some DNS records, and I have 4 capture groups, the second one is optional.

Is it possible to get these output so that there will always be five groups, and they are in the same order each time.

Regex I'm using in for grep -Eo

^([a-z0-9@\-\.\*]+?)\s+(?:([0-9]*)\s)?([A-Z]*)\s*(.*)$

Content to match:

www    43200 A 1.2.3.4
t        CNAME s.test.com.
blog             A 4.3.2.1
@                   MX 20 cluster3a.eu.messagelabs.com.
@                   TXT ( "some text" )

When fed in to something like awk, I'd like the output for each line to be as follows

$1 = www , $2 = 43200 , $3 = A , $4 = 1.2.3.4
$1 = t , $2 = , $3 = CNAME , $4 = s.test.com.
$1 = @ , $2 = , $3 = MX , $4 = MX 20 cluster3a.eu.messagelabs.com.
$1 = @ , $2 = , $3 = TXT , $4 = ( "some text" )

Just for clarity, i would like all groups to be output in order.

Tearing my hair out on this one. All help appreciated.

Upvotes: 0

Views: 424

Answers (1)

rici
rici

Reputation: 241731

You seem to be using a perl-style regex, which won't work with grep -E (you need grep -P, which is non-standard but works with gnu grep), and which also won't normally work with awk. Fortunately, you don't need any of the extensions.

Here's a simple regular expression which will work, in awk format, with exactly four captures:

/^([a-z0-9*@.-]+)[[:blank:]]+([0-9]*)[[:blank:]]*([A-Z]+)[[:blank:]]*(.*)/

You can use it as follows (but only with Gnu awk: the three-argument version of match is a Gnu extension):

awk '{match($0,
            /^([a-z0-9*@.-]+)[[:blank:]]+([0-9]*)[[:blank:]]*([A-Z]+)[[:blank:]]*(.*)/,
            field);
      print "<" field[1] "> <" field[2] "> <" field[3] "> <" field[4] ">";
     }' \
<<<'www    43200 A 1.2.3.4
t        CNAME s.test.com.
blog             A 4.3.2.1
@                   MX 20 cluster3a.eu.messagelabs.com.
@                   TXT ( "some text" )'

Output:

<www> <43200> <A> <1.2.3.4>
<t> <> <CNAME> <s.test.com.>
<blog> <> <A> <4.3.2.1>
<@> <> <MX> <20 cluster3a.eu.messagelabs.com.>
<@> <> <TXT> <( "some text" )>

However, it might well be simpler to use awk's own line splitting facility:

awk '{  if ($2~/^[[:digit:]]+$/) { ttl=$2; type=$3; arg=4; }
       else                     { ttl=0;  type=$2; arg=3; }
       name=$1
       args=$arg
       for (++arg;arg<=NF;++arg) args=args " " $arg                   
       # ...
     }'

Upvotes: 4

Related Questions