Python regex: Lookbehind + Lookahead with characterset

Question

I would like to get the string 10M5D8P into a dictionary:

M:10, D:5, P:8 etc. ...

The string could be longer, but it's always a number followed by a single letter from this alphabet: MIDNSHP=X

As a first step I wanted to split the string with a lookbehind and lookahead, in both cases matching this regex: [0-9]+[MIDNSHP=X]

So my not working solution looks like this at the moment:

import re

re.compile("(?<=[0-9]+[MIDNSHP=X])(?=[0-9]+[MIDNSHP=X])").split("10M5D8P")

It gives me an error message that I do not understand: "look-behind requires fixed-width pattern"

Avinash Raj · Accepted Answer

You may use re.findall.

>>> import re
>>> s = "10M5D8P"
>>> {i[-1]:i[:-1] for i in re.findall(r'[0-9]+[MIDNSHP=X]', s)}
{'M': '10', 'P': '8', 'D': '5'}
>>> {i[-1]:int(i[:-1]) for i in re.findall(r'[0-9]+[MIDNSHP=X]', s)}
{'M': 10, 'P': 8, 'D': 5}

Your regex won't work because re module won't support variable length lookbehind assertions. And also it won't support splitting on zero width boundary, so this (?<=\d)(?=[A-Z]) also can't be possible.

Python regex: Lookbehind + Lookahead with characterset

Answers (2)

Related Questions