Marco
Marco

Reputation: 193

regex match where order of substrings doesn't matter

have a problem in using regexp. I have a code of the following format.

(01)123456789(17)987654321

Now I want to capture the digits after (01) in a named group: group01 and the digits after (17) in a namedGroup: group17.

the problem is that the code could be in different order like this:

(17)987654321(01)123456789

the named groups should contain the same content.

any ideas?

thank you Marco

Upvotes: 1

Views: 2677

Answers (6)

redacted
redacted

Reputation: 2429

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. -- Jamie Zawinski

Glib quotes aside, regex seems like overkill. Python code:

string = "(17)987654321(01)123456789"

substrings = [s for s in string.split("(") if len(s) > 0]

results = dict()

for substring in substrings:
    substring = substring.split(")")
    results["group" + substring[0]] = substring[1]

print results

>>> {'group17': '987654321', 'group01': '123456789'}

Upvotes: 0

Rolice
Rolice

Reputation: 3103

Seeking for something like this?

(01|17)(\d*?)(01|17)(\d*?)

Expected matches: 0 => In most cases the whole match 1 => 01 or 17 2 => first decimal string 3 => second 01 or 17 4 => second decimal string

Tell me if it helps.

Upvotes: 0

loosecannon
loosecannon

Reputation: 7803

you didn't say what language, they all have their own quirks. But something like this should work if there is always 9 digits after the (). ( In Ruby)

No groups, but its a little clearer like this, in my opinion, may not work for you.

string = "(01)123456789(17)987654321"
group17 = string =~ /\(17\)\d{9}/
group01 = string =~ /\(01\)\d{9}/

string[group17+4,9]
string[group01+4,9]

EDIT: with named capture groups in ruby 1.9:

string = "(01)123456789(17)987654321"
if string =~ /\(17\)(?<g17>\d{9})/
  match = Regexp.last_match 
  group17 = match[:g01]
end
if string =~ /\(01\)(?<g01>\d{9})/
  match = Regexp.last_match 
  group01 = match[:g01]
end

Upvotes: 0

unpythonic
unpythonic

Reputation: 4070

Everyone seems to be hardcoding "01" and "17". Here's a more general solution:

while ( my $data =~ /\((\d+)\)(\d+)/g ) {
    my $group_number = $1;
    my $group_data   = $2;
    $group{$group_number} = $group_data;
}

As long as you have unsatisfied (numbers)numbers patterns matching in your data, it will grab each one in succession. In this Perl snippet, it stores each group's data into a hash keyed on the group number.

Upvotes: 1

tentux
tentux

Reputation: 293

This worked for me:

(?<group01>\(01\))[0-9]{9}|(?<group17>\(17\))[0-9]{9}

Upvotes: 1

agent-j
agent-j

Reputation: 27913

In Python, PCRE and PHP

(?:(?<=\(17\))(?<group17>\d+)|(?<=\(01\))(?<group01>\d+)|.)+

.Net supports the above syntax and this one:

(?:(?<=\(17\))(?'group17'\d+)|(?<=\(01\))(?'group01'\d+)|.)+

Upvotes: 1

Related Questions