Reputation: 85
So here's the Regular Expression I have so far.
r"(?s)(?<=([A-G][1-3])).*?(?=[A-G][1-3]|$)"
It looks behind for a letter followed by a number between A-G and 1-3 as well as doing the same when looking ahead. I've tested it using Regex101. Here's what it returns for each match
This is the string I'm testing it against,
"A1 **ACBFEKJRQ0Z+-** F2 **.,12STLMGHD** F1 **9)(** D2 **!?56WXP** C1 **IONVU43\"\'** E1 **Y87><** A3 **-=.,\'\"!?><()@**"
(the string shouldn't have any spaces but I needed to embolden the values between each Letter followed by a number so it is easier to see what I want)
What I want it to do is store the values between each of the matches for the group (The "Full Matches") and the matches for the group they coincide with to use later.
In the end I would like to end up with either a list of tuples or a dictionary for example:
dict = {"A1":"ACBFEKJRQ0Z+-", "F2":",12STLMGHD", "F1":"9)(", "next group match":"characters that follow"}
or
list_of_tuples = (["A1","ACBFEKJRQ0Z+-"], ["F2","12STLMGHD"], ["F1","9)("], ["next group match","characters that follow"])
The string being compared to the RegEx won't ever have something like "C1F2" btw
P.S. Excuse the terrible explanation, any help is greatly appreciated
Upvotes: 2
Views: 65
Reputation: 626738
I suggest
(?s)([A-G][1-3])((?:(?![A-G][1-3]).)*)
See the regex demo
The (?s)
will enable .
to match linebreaks, ([A-G][1-3])
will capture the uppercase letter+digit into Group 1 and ((?:(?![A-G][1-3]).)*)
will match all text that is not starting the uppercase letter+digit sequence.
The same regex can be unrolled as ([A-G][1-3])([^A-G]*(?:[A-G](?![1-3])[^A-G]*)*)
for better performance (no re.DOTALL
modifier or (?s)
is necessary with it). See this demo.
import re
regex = r"(?s)([A-G][1-3])((?:(?![A-G][1-3]).)*)"
test_str = """A1 ACBFEKJRQ0Z+-F2.,12STLMGHDF19)(D2!?56WXPC1IONVU43"'E1Y87><A3-=.,'"!?><()@"""
dct = dict(re.findall(regex, test_str))
print(dct)
Upvotes: 1