Reputation: 69
import re
match = re.findall(r'(a)(?:.*?(b)|.*?)(?:.*?(c)|.*?)(d)?',
'axxxbxd,axxbxxcd,axxxxxd,axcxxx')
print (match)
output: [('a', 'b', 'c', 'd'), ('a', '', 'c', '')]
I want output as below:
[('a','b','','d'),('a','b','c','d'),('a','','','d'),('a','','c','')]
Each list starts with 'a' and has 4 items from the string separated by comma respectively.
Upvotes: 1
Views: 342
Reputation: 626728
If you want to obtain several matches from a delimited string, either split the string with the delimiters first and run your regex, or replace the .
with the [^<YOUR_DELIMITING_CHARS>]
(paying attention to \
, ^
, ]
and -
that must be escaped). Also note that you can get rid of redundancy in the pattern using optional non-capturing groups.
Note that I assume that a
, b
and c
are placeholders and the real life values can be both single and multicharacter values.
import re
s = 'axxxbxd,axxbxxcd,axxxxxd,axcxxx'
r = r'(a)(?:.*?(b))?(?:.*?(c))?(d)?'
print([re.findall(r, x) for x in s.split(',')])
print ([re.findall(r, x) for x in re.split(r'\W', s)])
# => [('a', 'b', '', ''), ('a', 'b', 'c', 'd'), ('a', '', '', ''), ('a', '', 'c', '')]
See the Python demo.
If your delimiters are non-word chars, use \W
.
import re
s = 'axxxbxd,axxbxxcd,axxxxxd,axcxxx'
r = r'(a)(?:.*?(b)|.*?)(?:.*?(c)|.*?)(d)?'
print([re.findall(r, x) for x in s.split(',')])
print ([re.findall(r, x) for x in re.split(r'\W', s)])
# => [[('a', 'b', '', '')], [('a', 'b', 'c', 'd')], [('a', '', '', '')], [('a', '', 'c', '')]]
See the Python demo
If the strings can contain line breaks, pass re.DOTALL
modifier to the re.findall
calls.
Pattern details
(a)
- Group 1 capturing a
(?:.*?(b))?
- an optional non-capturing group matching a sequence of:
.*?
- any char (other than line break chars if the re.S
/ re.DOTALL
modifier is not used), zero or more occurrences, but as few as possible(b)
- Group 2: a b
value(?:.*?(c))?
.*?
- any char (other than line break chars if the re.S
/ re.DOTALL
modifier is not used), zero or more occurrences, but as few as possible(c)
- Group 3: a c
value(d)?
- Group 4 (optional): a d
.Upvotes: 1
Reputation: 92854
Considering that the crucial sequence a... b... c... d
should be matched in strict order - use straight-forward approach:
s = 'axxxbxd,xxbxxcxxd,xxbxxxd|axcxxx' # extended example
result = []
for seq in re.split(r'\W', s): # split by non-word character
result.append([c if c in seq else '' for c in ('a','b','c','d')])
print(result)
The output:
[['a', 'b', '', 'd'], ['', 'b', 'c', 'd'], ['', 'b', '', 'd'], ['a', '', 'c', '']]
Upvotes: 1