Reputation: 1929
I have this string:
"(a) first, (b) second, (c) important"
I'm trying to find all of the strings before the "(c) important" string, so this is my regex:
"(?:\([a-z]\) ([a-z]+), )+\([a-z]\) important"
re.findall
finds only the "second" string (without the "first" string).
I tried using re.finditer
and the regex module (with the overlapping flag) but they all return the same result.
What should be done here so it will find all of the strings before the "important" string?
Note -
The input string can be different. For example:
"(a) aa, (b) cc, (c) dd, (d) oi, (e) important" # should return ["aa", "cc", "dd", "oi"]
"(a) aa, (b) asdf, (c) wer" # should return nothing
Upvotes: 4
Views: 47
Reputation: 626738
You may use
\([a-z]\)\s+([a-z]+)(?=(?:\s*,\s*\([a-z]\)\s+[a-z]+)*\s*,\s*\([a-z]\)\s+important)
See the regex demo
Details
\([a-z]\)
- a lowercase letter inside parentheses\s+
- 1+ whitespaces([a-z]+)
- Group 1: one or more lowercase letter inside parentheses(?=(?:\s*,\s*\([a-z]\)\s+[a-z]+)*\s*,\s*\([a-z]\)\s+important)
- a positive lookahead that matches a location immediately followed by
(?:\s*,\s*\([a-z]\)\s+[a-z]+)*
- 0 or more repetitions of
\s*,\s*
- a comma enclosed with 0+ whitespaces\([a-z]\)
- a letter enclosed in parentheses\s+
- 1+ whitespaces[a-z]+
- 1+ lowercase letters\s*,\s*
- a comma enclosed with 0+ whitespaces\([a-z]\)
- a lowercase letter inside parentheses\s+
- 1+ whitespacesimportant
- a word.import re
strs = ["(a) first, (b) second, (c) important", "(a) aa, (b) cc, (c) dd, (d) oi, (e) important", "(a) aa, (b) asdf, (c) wer" ]
r = re.compile(r'\([a-z]\)\s+([a-z]+)(?=(?:\s*,\s*\([a-z]\)\s+[a-z]+)*\s*,\s*\([a-z]\)\s+important)')
for s in strs:
print(r.findall(s))
Output:
['first', 'second']
['aa', 'cc', 'dd', 'oi']
[]
Upvotes: 3