Reputation: 11

Detecting a repeated sequence with regex

I have a text example like

0s11 0s12 0s33 my name is 0sgfh 0s1 0s22 0s87

I want to detect the consecutive sequences that start 0s.

So, the expected output should be 0s11 0s12 0s33, 0sgfh 0s1 0s22 0s87

I tried using regex

(0s\w+)

but that would detect each 0s11, 0s12, 0s33, etc. individually.

Any idea on how to modify the pattern?

Upvotes: 0

Answers (2)

The fourth bird

Reputation: 163207

To get those 2 matches where there are at least 2 consecutive parts:

\b0s\w+(?:\s+0s\w+)+

Explanation

\b A word boundary to prevent a partial word match
0s\w+ Match os and 1+ word chars
(?:\s+0s\w+)+ Repeat 1 or more times whitespace chars followed by 0s and 1+ word chars

Regex demo

If you also want to match a single occurrence:

\b0s\w+(?:\s+0s\w+)*

Regex demo

Note that \w+ matches 1 or more word characters so it would not match only 0s

Upvotes: 1

Koedlt

Reputation: 5963

Should be doable with re.findall(). Your pattern was correct! :)

import re
testString = "0s11 0s12 0s33 my name is 0sgfh 0s1 0s22 0s87"
print(re.findall('0s\w', testString))

['0s11', '0s12', '0s33', '0sgfh', '0s1', '0s22', '0s87']

Hope this helps!

Upvotes: 0

Detecting a repeated sequence with regex

Answers (2)

Related Questions