morfys
morfys

Reputation: 2415

lookahead assertions

I'm trying to match a label within a valid domain name using a regular expression in Python:

DOMAIN_LABEL_RE = """
\A(
(?<![\d\-]) # cannot start with digit or hyphen, looking behind
([a-zA-Z\d\-]*?)
([a-zA-Z]+)# need at least 1 letter
([a-zA-Z\d\-]*?)
(?!\-) # cannot end with a hyphen, looking ahead
)\Z
"""

I'm trying to use a positive and negative assertion to avoid a hyphen at the beginning or end of the label.

But the string "-asdf" still matches: e.match(DOMAIN_LABEL_RE, "-asdf", re.VERBOSE).group()

I don't understand why it's still matching.

Thanks for any help.

M.

Upvotes: 1

Views: 594

Answers (1)

Felix Kling
Felix Kling

Reputation: 816302

\A matches the start of the string and the following lookbehind matches if there is no hyphen before that position.

You are at the beginning of the string, of course there is no character before it!

Use a negative lookahead instead: (?![\d\-]).

Similar for the end of the string. You have to use a negative lookbehind instead (?<!\-).

I think an equivalent expressions to your current one would be:

DOMAIN_LABEL_RE = """
(?i               # case insensitive
  \A(
    ([a-z])       # need at least 1 letter and cannot start with digit or hyphen
    ([a-z\d-]*?)
    (?<!-)        # cannot end with a hyphen
  )\Z
)
"""

Note: I did not check whether the expression is actually suited for the problem you are trying to solve.

Upvotes: 3

Related Questions