RangerRanger
RangerRanger

Reputation: 2493

Regular expression for validating long, complicated dns targets

The DNS entries i am trying to validate are quite long. Here's an example of what the structure might look like:

qwer-0123a4bcd567890e1-uuuuu3xx.qwer-gfd-1e098765dcb4a3210.ps-sdlk-6.qwer.domain.com

These entries can be thought of as three distinct parts:

  1. qwer-0123a4bcd567890e1-uuuuu3xx.qwer-gfd-1e098765dcb4a3210.

    • Always starts with qwer-
    • Followed by 17 alphanumerics, a -, 8 more alphanumerics
    • Followed by qwer-gfd-
    • Followed by 17 more alphanumerics and a .
  2. ps-sdlk-6

    • Always starts with ps-sdlk-
    • Followed by either one or two alphanumeric. In this case it could be ps-sdlk-6 or something like ps-sdlk-6e
  3. .qwer.domain.com

    • The domain target always ends with .qwer.domain.com

I've been hacking together a regex and came up with this monstrosity:

qwer-[\w]{17}-[\w]{8}.qwer-gfd-[\w]{17}.(.*)(qwer.domain.com)

That solution is pretty hideous and it returns multiple match groups which doesn't give me much confidence in the accuracy. I'm using ruby 2.5 but non std lib stuff is difficult to import in this case.

Is there a more sensible and complete/accurate regex to confirm the validity of these dns targets? Is there a better way to do this without regex?

Upvotes: 1

Views: 94

Answers (2)

Cary Swoveland
Cary Swoveland

Reputation: 110735

Considering the complexity of testing longish regular expressions, and also the possibility--if not the probability--that changes will be needed in future, I would be inclined to split the string on hyphens and test each string in the resulting array.

PIECES = [['qwer'],
          ['0123a4bcd567890e1'.size],
          ['uuuuu3xx'.size, '.qwer'],
          ['gfd'],
          ['1e098765dcb4a3210'.size, '.ps'],
          ['sdlk'],
          [[1, 2], '.qwer.domain.com']].
  map do |a|
    Regexp.new(
      a.each_with_object('\A') do |o,s|
        case o
        when String
          s << o.gsub('.', '\.')
        when Integer
          s << "\\p{Alnum}{#{o}}"
        else # array
          s << "\\p{Alnum}{#{o.first},#{o.last}}"
        end
      end << '\z')
    end
  #=> [/\Aqwer\z/, /\A\p{Alnum}{17}\z/, /\A\p{Alnum}{8}\.qwer\z/,
  #    /\Agfd\z/, /\A\p{Alnum}{17}\.ps\z/, /\Asdlk\z/,
  #    /\A\p{Alnum}{1,2}\.qwer\.domain\.com\z/]

Notice that I've used single quotes in most places to be able to write, for example, '\A' rather than "\\A". However, double quotes are needed for the two lines where interpolation is performed (#{o}). I've also used strings from the example to determine the lengths of various runs of alphanumeric characters and have escaped periods and added anchors in simple code. I did that to both reduce the chance of counting errors and help readers of the code understand what is being done. Though the elements of PIECES (regular expressions) are here being used to test the string used to construct PIECES that is of course irrelevant, assuming, as we must, that all strings to be tested will have the same pattern.

def valid?(str)
  arr = str.split('-')
  return false unless arr.size == PIECES.size
  arr.zip(PIECES).all? { |s,r| s.match? r }
end

If Enumerable#all?'s block returns false all? immediately returns false. This is sometimes referred to as short-circuiting behaviour.

For the string given in the example, str,

valid?(str)
  #=> true

Note the following intermediate calculation.

str.split('-').zip(PIECES)
  #=> [["qwer", /\Aqwer\z/],
  #    ["0123a4bcd567890e1", /\A\p{Alnum}{17}\z/],
  #    ["uuuuu3xx.qwer", /\A\p{Alnum}{8}\.qwer\z/],
  #    ["gfd", /\Agfd\z/],
  #    ["1e098765dcb4a3210.ps", /\A\p{Alnum}{17}\.ps\z/],
  #    ["sdlk", /\Asdlk\z/],
  #    ["6.qwer.domain.com", /\A\p{Alnum}{1,2}\.qwer\.domain\.com\z/]]

This may seem overkill (and I'm not certain it isn't), but it does facilitate debugging and testing and if, in future, the string pattern changes (within limits) it should be relatively easy to modify the matching test (by changing the array of arrays above from which PIECES is derived).

Upvotes: 2

Nick
Nick

Reputation: 147216

I think given your input data you have no choice but an ugly regex e.g.

^qwer-\w{17}-\w{8}\.qwer-gfd-\w{17}\.ps-sdlk-\w{1,2}\.qwer\.domain\.com$

Note that I have used \w as you did, however \w also matches _ as well as alphanumeric characters, so you may want to replace it with [A-Za-z0-9]. Also, . will match any character, so to specifically match a . you need \. in your regex.

Demo on regex101.com

Upvotes: 1

Related Questions