Reputation: 1127
I would like to get the string 10M5D8P into a dictionary:
M:10, D:5, P:8 etc. ...
The string could be longer, but it's always a number followed by a single letter from this alphabet: MIDNSHP=X
As a first step I wanted to split the string with a lookbehind and lookahead, in both cases matching this regex: [0-9]+[MIDNSHP=X]
So my not working solution looks like this at the moment:
import re
re.compile("(?<=[0-9]+[MIDNSHP=X])(?=[0-9]+[MIDNSHP=X])").split("10M5D8P")
It gives me an error message that I do not understand: "look-behind requires fixed-width pattern"
Upvotes: 0
Views: 217
Reputation: 179402
look-behind requires fixed-width pattern
means exactly what it says - a look-behind pattern must match a fixed number of characters in the Python engine. In particular, it is not allowed to contain any quantifiers (?
, +
, *
). Thus, we should pick a fixed-width piece to use as our lookbehind:
(?<=[MIDNSHP=X])(?=\d)
This uses just the single character as the lookbehind and a single digit as the lookahead. However, if you try to split
with this expression it will fail due to Python bug 3262. You need to use a workaround like this instead:
>>> re.compile(r"(?<=[MIDNSHP=X])(?=\d)").sub('|', '10M5D8P').split("|")
['10M', '5D', '8P']
but this is pretty ugly. A simpler solution is to use findall
to extract what you want:
>>> re.findall('([0-9]+)([MIDNSHP=X])', '10M5D8P')
[('10', 'M'), ('5', 'D'), ('8', 'P')]
from which you can pretty easily create a dictionary:
>>> {k:int(v) for v,k in re.findall('([0-9]+)([MIDNSHP=X])', '10M5D8P')}
{'P': 8, 'M': 10, 'D': 5}
Upvotes: 2
Reputation: 174696
You may use re.findall.
>>> import re
>>> s = "10M5D8P"
>>> {i[-1]:i[:-1] for i in re.findall(r'[0-9]+[MIDNSHP=X]', s)}
{'M': '10', 'P': '8', 'D': '5'}
>>> {i[-1]:int(i[:-1]) for i in re.findall(r'[0-9]+[MIDNSHP=X]', s)}
{'M': 10, 'P': 8, 'D': 5}
Your regex won't work because re
module won't support variable length lookbehind assertions. And also it won't support splitting on zero width boundary, so this (?<=\d)(?=[A-Z])
also can't be possible.
Upvotes: 2