Giovanni
Giovanni

Reputation: 212

Python Regex for matching all numbers inside pipes characters

Given the following string:

string = "123|1*[123;abc;3;52m;|0|62|0|0|0|12|,399|;abc"

I want to match all the numbers inside a pair of pipes chars.
So in that case I want the final value of matches equal to [0, 62, 0, 0, 0, 12]:

So far I tried the following regex that only return [0, 0, 0]:

matches = re.findall("\|(\d+)\|", string)

If I replace + with {1,}, it'll keep returning only [0, 0, 0], but when I replace + with {2,} it return [62, 12].

So I don't really understand what I'm doing wrong, thanks for the help

Upvotes: 0

Views: 772

Answers (4)

Akilan Manivannan
Akilan Manivannan

Reputation: 956

(?<=\|)\d+(?=\|)

Breaking that down:

  • (?<=\|) is a positive lookbehind that asserts that whatever is captured must be after the | symbol
  • \d+ says to look for only digits. The + tells it to continue looking until it stops.
  • (?<=\|) Finally a positive lookahead to tell it to be in between the pipes.

Here's some boilerplate code from regex101:

import re

regex = r"(?<=\|)\d+(?=\|)"

test_str = "123|1*[123;abc;3;52m;|0|62|0|0|0|12|,399|;abc"

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):
    
    print("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
    
    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1
        
        print("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

Here's the output:

Match 2 was found at 24-26: 62
Match 3 was found at 27-28: 0
Match 4 was found at 29-30: 0
Match 5 was found at 31-32: 0
Match 6 was found at 33-35: 12

Upvotes: 1

iliass ben
iliass ben

Reputation: 31

i think because it skips the neighbor pattern in-between | if it found the pattern. Here is a work around:

def get_nums(s):
    items = s.split('|')
    found = []
    for i, item in enumerate(items):
        if i and item.strip().isdigit():
            found.append(item)
    return found 

Upvotes: 0

Siddharth Sundar
Siddharth Sundar

Reputation: 28

The problem is that once your expression matches |0|, it cannot match the same closing | as the opening | for the next number.

Try using this regular expression - '\|(\d+)(?=\|)'. Here, the '(?=...)' part is called a positive lookahead. The match succeeds only if it can match the regex at that point, but no characters will be consumed by the engine.

Upvotes: 1

VPfB
VPfB

Reputation: 17352

With findall one pipe character "|" cannot belong to the number before and to the number after it in the same time. (well, maybe with a lookahead)

Take for example the string "|0|62|0|". The first part "|0|" matches the pattern and is added to the results. Then the pattern matching continues with the rest of the string, i.e. with 62|0|. In this substring a second matchis found: |0|. The middle number 62 is not found this way.

I would suggest to split the string, disregard the first and last item, because they are not between two pipe characters. Then check the remaining items if they match "\d+". You can do it with a one-liner, but here it is divided into steps:

s1 = "123|1*[123;abc;3;52m;|0|62|0|0|0|12|,399|;abc"
s2 = s1.split('|')
# ['123', '1*[123;abc;3;52m;', '0', '62', '0', '0', '0', '12', ',399', ';abc']
s3 = s2[1:-1]
s4 = [s for s in s3 if re.fullmatch('\d+', s)]
# ['0', '62', '0', '0', '0', '12']

Upvotes: 0

Related Questions