Anjan
Anjan

Reputation: 13

Searching to End of String in Regex

I am trying to extract all sequences of '1's from a string of binary digits (0 and 1) and get them into a list.
For example the string may be of the form 001111000110000111111. And I am looking for a list that looks like this ["1111", "11", "111111"].

I am using the python findall function with the following ([1]+?)0. However, it does not match the last sequence of 1's since that ends with a EOS rather than a '0'. I have tried to use ([1]+?)0|$ to try to capture the EOS as a valid delimited.

But that fails too.
Any help appreciated.

Upvotes: 1

Views: 82

Answers (3)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626748

Matching: To match one or more 1s, use 1+ regex.

Splitting: You may split with 1 or more 0s and remove empty elements.

See Python demo:

import re
s = '001111000110000111111'
print(re.findall('1+', s))                   # ['1111', '11', '111111']
print([x for x in re.split('0+', s) if x])   # ['1111', '11', '111111']

Upvotes: 0

Nehal J Wani
Nehal J Wani

Reputation: 16629

What you are trying:

([1]+?)0

Regular expression visualization

Regex101 Demo

([1]+?)0|$

Regular expression visualization

Regex101 Demo

What will work:

(1+)

Regular expression visualization

Regex101 Demo

Upvotes: 1

Keith Bailey
Keith Bailey

Reputation: 296

I think the regex you're looking for is:

1+(?!\0)

i.e. match one or more 1s which aren't followed by a 0.

The one you have is specifically looking for ones that are followed by 0s.

you can play around with regexs on various jsfiddle like sites, with interactive explanations of what they're doing. ex:

https://regex101.com/r/qY4iN9/1

Upvotes: 0

Related Questions