Ursa Major
Ursa Major

Reputation: 901

Extracting multiple occurrences between 2 delimiters in a line

How do we generically get the numbers into a list ? The delimiters may be "(" and ")", and it can be "[" and "]" or "{" and "}", or even "start" and "end", etc.

line = "-(123) = (456) = (789)-"

result = re.findall(r"\([^']*\)", line)

for i in result:
    print(i)

The numbers or any contents between the 2 delimiters are what we want to put in a list.

Upvotes: 0

Views: 70

Answers (2)

user557597
user557597

Reputation:

I submit there is only 1 easy way to do this; a two step process.

>>> import re
>>> line = r' [one] (two) {three} startfourend '
>>> ary = re.findall( r'(\([^)]*\)|\[[^\]]*\]|{[^}]*}|start(?:(?!end)[\S\s])*end)', line)
>>> ary = [ re.sub(r'^(?:[\[({]|start)|(?:[\])}]|end)$', '', element) for element in ary ]
>>> print (ary)
['one', 'two', 'three', 'four']

Regex for findall - to find all the elements

 (                             # (1 start)
      \( [^)]* \)
   |  
      \[ [^\]]* \]
   |  
      { [^}]* }
   |  
      start  
      (?:
           (?! end )
           [\S\s]    
      )*
      end
 )                             # (1 end)

Regex for sub - trimming the array elements

    ^ 
    (?: [\[({] | start )
 |  
    (?: [\])}] | end )
    $

Note that if you desire whitespace trimming on the elements
change the regex to this

    ^ 
    (?: [\[({] | start )
    \s* 
 |  
    \s* 
    (?: [\])}] | end )
    $

Upvotes: 0

truth
truth

Reputation: 1186

What you have here is a greedy match -- the * will match as many characters as possible, from the first ( to the last ), giving just one large match.

Use a non-greedy match instead: \([^']*?\)

If you want to skip the delimiters, use capturing parens: \(([^']*?)\)

Regex101 link: https://regex101.com/r/5wYz7v/1

Upvotes: 3

Related Questions