carlos bustamante
carlos bustamante

Reputation: 21

Python regular expression retrieving numbers between two different delimiters

I have the following string

"h=56,7,1,d=88,9,1,h=58,8,1,d=45,h=100,d=,"

I would like to use regular expressions to extract the groups:

My ultimate goal is to have tuples such as (group1, group2), (group3, group4), (group5, group6). I am not sure if this all can be accomplished with regular expressions.

I have the following regular expression with gives me partial results

(?<=h=|d=)(.*?)(?=h=|d=)

The matches have an extra comma at the end like 56,7,1, which I would like to remove and d=, is not returning a null.

Upvotes: 2

Views: 658

Answers (5)

Jan
Jan

Reputation: 43169

You could match rather than split using the expression

[dh]=([\d,]*),

and grab the first group, see a demo on regex101.com.


That is

[dh]=     # d or h, followed by =
([\d,]*)  # capture d and s 0+ times
,         # require a comma afterwards


In Python:

import re

rx = re.compile(r'[dh]=([\d,]*),')

string = "h=56,7,1,d=88,9,1,h=58,8,1,d=45,h=100,d=,"

numbers = [m.group(1) for m in rx.finditer(string)]
print(numbers)

Which yields

['56,7,1', '88,9,1', '58,8,1', '45', '100', '']

Upvotes: 1

Stephen Rauch
Stephen Rauch

Reputation: 49804

You likely do not need to use regex. A list comprehension and .split() can likely do what you need like:

Code:

def split_it(a_string):
    if not a_string.endswith(','):
        a_string += ','
    return [x.split(',')[:-1] for x in a_string.split('=') if len(x)][1:]

Test Code:

tests = (
    "h=56,7,1,d=88,9,1,h=58,8,1,d=45,h=100,d=,",
    "h=56,7,1,d=88,9,1,d=,h=58,8,1,d=45,h=100",
)

for test in tests:
    print(split_it(test))

Results:

[['56', '7', '1'], ['88', '9', '1'], ['58', '8', '1'], ['45'], ['100'], ['']]
[['56', '7', '1'], ['88', '9', '1'], [''], ['58', '8', '1'], ['45'], ['100']]

Upvotes: 4

chirag sanghvi
chirag sanghvi

Reputation: 892

Here, I have assumed that parameters will only have numerical values. If it is so, then you can try this. (?<=h=|d=)([0-9,]*)

Hope it helps.

Upvotes: 0

purarue
purarue

Reputation: 2164

You could use $ in positive lookahead to match against the end of the string:

import re

input_str = "h=56,7,1,d=88,9,1,h=58,8,1,d=45,h=100,d=,"
groups = []
for x in re.findall('(?<=h=|d=)(.*?)(?=d=|h=|$)', input_str):
    m = x.strip(',')
    if m:
        groups.append(m.split(','))
    else:
        groups.append(None)

print(groups)

Output:

[['56', '7', '1'], ['88', '9', '1'], ['58', '8', '1'], ['45'], ['100'], None]

Upvotes: 0

Mohammad Javad Noori
Mohammad Javad Noori

Reputation: 1217

You can use ([a-z]=)([0-9,]+)(,)?

Online demo

just you need add index to group

Upvotes: 0

Related Questions