Vivek Kumar
Vivek Kumar

Reputation: 189

How to write a better regex in python?

I have two scenarios to match . Length should be exactly 16. Pattern should contain A-F,a-f,0-9 and '-' in 1st case.

  1. AC-DE-48-23-45-67-AB-CD
  2. ACDE48234567ABCD

I have tried with r'^([0-9A-Fa-f]{16})$|(([0-9A-Fa-f]{2}\-){7}[0-9A-Fa-f]{2})$'this , which is working fine . Looking for better expression .

Upvotes: 0

Views: 48

Answers (1)

Nick
Nick

Reputation: 147166

You can simplify the regex by considering the string to be a group of two hex digits followed by an optional -, followed by 6 similar groups (i.e. if the first group had a -, the subsequent ones must too), followed by a group of 2 hex digits:

^[0-9A-Fa-f]{2}(-?)([0-9A-Fa-f]{2}\1){6}[0-9A-Fa-f]{2}$

Use of the re.I flag allows you to remove the a-f from the character classes:

^[0-9A-F]{2}(-?)([0-9A-F]{2}\1){6}[0-9A-F]{2}$

You can also simplify slightly further by replacing 0-9 by \d in the character classes (although personally I find 0-9 easier to read):

^[\dA-F]{2}(-?)([\dA-F]{2}\1){6}[\dA-F]{2}$

Demo on regex101

Sample python code:

import re

strs = ['AC-DE-48-23-45-67-AB-CD',
        'ACDE48234567ABCD',
        'AC-DE48-23-45-67-AB-CD',
        'ACDE48234567ABC',
        'ACDE48234567ABCDE']

for s in strs:
    print(s + (' matched' if re.match(r'^[0-9A-F]{2}(-?)([0-9A-F]{2}\1){6}[0-9A-F]{2}$', s, re.I) else ' didn\'t match'))

Output

AC-DE-48-23-45-67-AB-CD matched
ACDE48234567ABCD matched
AC-DE48-23-45-67-AB-CD didn't match
ACDE48234567ABC didn't match
ACDE48234567ABCDE didn't match

Upvotes: 1

Related Questions