Richard Riverlands
Richard Riverlands

Reputation: 33

Python Regex: how to parse repeated groups in a string?

I want to match a pattern with a string including pure numbers, such as '2324235235980980' with a pattern like as described below:

The pattern is '2-6-8-7-4', in which the pattern starts with 2, transit to 6, either self-loop at 6 or transit to 8, then it could go back and forth between 6 and 8, could self-loop at 8, or could transit to 7. And the same thing for 7. One more thing for 7 is 7-8-6-8-7 could happen. Finally, 7 could reach 4, once it reaches 4, the pattern is done. During the process, if it reaches out to other points, then it has to start with 2 again to be counted. I use

import re    
re.findall(r'(2((6+8+)+)7)', test_string)

the output includes '2666686888668887', but when I add 4, I don't know the syntax to compile this. Has anyone an idea? Thanks a lot!

Upvotes: 3

Views: 222

Answers (2)

wp78de
wp78de

Reputation: 18950

I think this is easier achieved than initially expected:

26[68]+?[687]+?4

2-followed-by-6-followed-by-6|8-followed-by-6|8|7-followed-by-4.

The only not so obvious part is to make the pattern lazy.

Here is an even better pattern:

\b26?([^7]6|8|[^6]7)+?4\b

2-followed-by-(not7)6|8|(not6)7-followed-by-4.

Upvotes: 1

Tania Reyes
Tania Reyes

Reputation: 1

I don't know if I understand what you need, but maybe this can work for you:

string = "2666686888668887748926874"
index = [(m.start(0), m.end(0)) for m in re.finditer(r'2(6+8+)+7+\1?4', string)]
print(index)

Prints: [(0, 18), (20, 25)].

Is a list of tuples with the start and end index for every occurrence.

Upvotes: 0

Related Questions