user881300
user881300

Reputation: 212

Python Regex match a mac address from the end?

I have the following re to extract MAC address:

re.sub( r'(\S{2,2})(?!$)\s*', r'\1:', '0x0000000000aa bb ccdd ee ff' )

However, this gave me 0x:00:00:00:00:00:aa:bb:cc:dd:ee:ff.

How do I modify this regex to stop after matching the first 6 pairs starting from the end, so that I get aa:bb:cc:dd:ee:ff?

Note: the string has whitespace in between which is to be ignored. Only the last 12 characters are needed.

Edit1: re.findall( r'(\S{2})\s*(\S{2})\s*(\S{2})\s*(\S{2})\s*(\S{2})\s*(\S{2})\s*$',a) finds the last 6 pairs in the string. I still don't know how to compress this regex. Again this still depends on the fact that the strings are in pairs.

Ideally the regex should take the last 12 valid \S characters starting from the end and string them with :

Edit2: Inspired by @Mariano answer which works great but depends on the fact that that last 12 characters must start with a pair I came up with the following solution. It is kludgy but still seems to work for all inputs.

string = '0x0000000000a abb ccddeeff'
':'.join( ''.join( i ) for i in re.findall( '(\S)\s*(\S)(?!(?:\s*\S\s*{11})',' string) )
'aa:bb:cc:dd:ee:ff'

Edit3: @Mariano has updated his answer which now works for all inputs

Upvotes: 1

Views: 425

Answers (4)

Mariano
Mariano

Reputation: 6511

This will work for the last 12 characters, ignoring whitespace.

Code:

import re

text = "0x0000000000aa bb ccdd ee ff"

result = re.sub( r'.*?(?!(?:\s*\S){13})(\S)\s*(\S)', r':\1\2', text)[1:]

print(result)

Output:

aa:bb:cc:dd:ee:ff

DEMO


Regex breakdown:

The expression used in this code uses re.sub() to replace the following in the subject text:

.*?                 # consume the subject text as few as possible
(?!(?:\s*\S){13})   # CONDITION: Can't be followed by 13 chars
                    #  so it can only start matching when there are 12 to $
(\S)\s*(\S)         # Capture a char in group 1, next char in group 2
                    #
  # The match is replaced with :\1\2
  # For this example, re.sub() returns ":aa:bb:cc:dd:ee:ff"
  # We'll then apply [1:] to the returned value to discard the leading ":"

Upvotes: 2

Kasravnd
Kasravnd

Reputation: 107337

You can use re.finditer to find all the pairs then join the result :

>>> my_string='0x0000000000aa bb ccdd ee ff'
>>> ':'.join([i.group() for i in re.finditer( r'([a-z])\1+',my_string )])
'aa:bb:cc:dd:ee:ff'

Upvotes: 1

Salo
Salo

Reputation: 2126

I know this is not a direct answer to your question, but do you really need a regular expression? If your format is fixed, this should also work:

>>> s = '0x0000000000aa bb ccdd ee ff'
>>> ':'.join([s[-16:-8].replace(' ', ':'), s[-8:].replace(' ', ':')])
'aa:bb:cc:dd:ee:ff'

Upvotes: 0

Avinash Raj
Avinash Raj

Reputation: 174756

You may do like this,

>>> import re
>>> s = '0x0000000000aa bb ccdd ee ff'
>>> re.sub(r'(?!^)\s*(?=(?:\s*[a-z]{2})+$)', ':', re.sub(r'.*?((?:\s*[a-z]){12})\s*$', r'\1', s ))
'aa:bb:cc:dd:ee:ff'
>>> s = '???767aa bb ccdd ee ff'
>>> re.sub(r'(?!^)\s*(?=(?:\s*[a-z]{2})+$)', ':', re.sub(r'.*?((?:\s*[a-z]){12})\s*$', r'\1', s ))
'aa:bb:cc:dd:ee:ff'
>>> s = '???767aa bb ccdd eeff    '
>>> re.sub(r'(?!^)\s*(?=(?:\s*[a-z]{2})+$)', ':', re.sub(r'.*?((?:\s*[a-z]){12})\s*$', r'\1', s ))
'aa:bb:cc:dd:ee:ff'

Upvotes: 0

Related Questions