humbleStrength
humbleStrength

Reputation: 131

How to delete only exact matching word and exclude pattern with hyphen character in sed

I know there are many sed posts in this community but I did do my due diligence to evaluate them but I none proved to be the solution, or at least I could not derive one from them. I wrote a bash script for deleting resource record lines in our Nameserver zone files. Such as

smtp            IN      A       12.34.567.890

or

media           IN      CNAME   vertex-prd.looney.acme.global.

It works for most cases, however, during testing I found a case that I need to account for but have not been able to do so. Basically, if there is a hyphen in the Host Label name (host label is the data entries in the left-most column, so smtp, media, etc) then the sed command deletes it, even though I specify the argument without it.

My command, example using smtp as text string which is received from user input at the cli as an arg:

sed -ie "/\bsmtp\b.*A\|A.*\bsmtp\b/d" "exampledomain.com.zonefile"

I also use an awk command to delete wildcard records:

awk -i /usr/share/awk/inplace.awk '!(/*/ && /A/ && /'"78.89.56.90"'/)' "exampledomain.com.zonefile"

I have tried manipulating this command with \< \> to mark the edge of a word or ^$, but not working. I have not tried using awk with match because I am not at that skill level yet.

How it is used:

-a is for A records, -c is for CNAME records. The second position is the option arg which is the text that is searched for to pattern match in the sed or awk commands, the third position is the name of the domain zone file which is the input file: ./my_script -a smtp sampledomainzone.com or ./my_script -c media sampledomainzone.com

A set of lines I am running it against:

Fragment from sampledomainzone.com:

smtp            IN      A       67.79.4.187
smtp5           IN      A       64.132.0.84
1smtp           IN      A       324.175.85.89
devsmtp         IN      A       67.79.4.187
test-smtp       IN      A        67.79.4.187
smtp-dev        IN      A       38.74.745.33
dev-media       IN      CNAME   vertex-bogus.looney.acme.global.
media-dev       IN      CNAME   vertex-bogus.looney.acme.global.
1media          IN      CNAME   vertex-bogus.looney.acme.global.
media           IN      CNAME   vertex-bogus.looney.acme.global.

Actual results:

...
smtp5           IN      A       64.132.0.84
1smtp           IN      A       324.175.85.89
devsmtp         IN      A       67.79.4.187
...

Or with media as arg:

...
1media           IN      CNAME  vertex-prd.looney.acme.global.
...

Expected:

smtp5           IN      A       64.132.0.84
1smtp           IN      A       324.175.85.89
devsmtp         IN      A       67.79.4.187
test-smtp        IN      A        67.79.4.187
smtp-dev        IN      A       38.74.745.33

Or

dev-media       IN      CNAME   vertex-prd.looney.acme.global.
media-dev       IN      CNAME   vertex-prd.looney.acme.global.
1media           IN      CNAME  vertex-prd.looney.acme.global.

My sed command deletes the smtp or media line, BUT also deletes any line with a hyphen - in it. The other lines that have those strings are not removed, as intended, but because they do not have the hyphen in them they are safe.

What I have tried:

Using sed with \B to match non-word chars, in addition to word boundary matching with \b in my command.

Using [^-]* in my command.

Using awk instead, like so:

awk -i /usr/share/awk/inplace.awk '!(/\<'"${HOSTLABEL}"'\>/ && /A/)' "$DOMAINZONEFILE"

or

awk -i /usr/share/awk/inplace.awk '!(/^'"${HOSTLABEL}"'$/ && /A/)' "$DOMAINZONEFILE"

Any insight would be much appreciated.

Upvotes: 1

Views: 242

Answers (1)

humbleStrength
humbleStrength

Reputation: 131

As per my discourse with jhnc in the comments, the answer was to approach the problem with a different sed command specification entirely.

sed -ie "/^${HOSTLABEL}[[:space:]]\{1,\}IN[[:space:]]\{1,\}A/d" "$DOMAINZONEFILE"

Without the var expansion I am using:

sed -ie "/^smtp[[:space:]]\{1,\}IN[[:space:]]\{1,\}A/d" "$DOMAINZONEFILE"

Learned that using \{i,\} "matches more than or equal to i sequences", or "interval expression of the format "{m}", "{m,}", or "{m,n}", together with that interval expression it shall match what repeated consecutive occurrences of the BRE would match". In my case, a single occurrence to ensure minimal blast radius.

Also character classes and bracket expressions are immensely useful for conciseness and readability, such as [:alnum:] being equivalent to [0-9A-Za-z]

UPDATE:

As per user Renaud, an alternative that can be considered simpler or straightforward is using awk without doing regex or any manual line splitting: awk -i inplace '$1!="'"$HOSTLABEL"'" || $2!="IN" || $3!="A"' "$DOMAINZONEFILE". This applies to my case easily because in BIND zone file formats there is white space between the data columns and can be used as the field separators for the line splitting by default, at least this what I have come to understand.

Upvotes: 1

Related Questions