Reputation: 13
I am trying to extract all sequences of '1's from a string of binary digits (0 and 1) and get them into a list
.
For example the string may be of the form 001111000110000111111
. And I am looking for a list that looks like this ["1111", "11", "111111"]
.
I am using the python findall
function with the following ([1]+?)0
. However, it does not match the last sequence of 1's since that ends with a EOS
rather than a '0'. I have tried to use ([1]+?)0|$
to try to capture the EOS
as a valid delimited.
But that fails too.
Any help appreciated.
Upvotes: 1
Views: 82
Reputation: 626748
Matching: To match one or more 1
s, use 1+
regex.
Splitting: You may split with 1 or more 0
s and remove empty elements.
See Python demo:
import re
s = '001111000110000111111'
print(re.findall('1+', s)) # ['1111', '11', '111111']
print([x for x in re.split('0+', s) if x]) # ['1111', '11', '111111']
Upvotes: 0
Reputation: 16629
What you are trying:
([1]+?)0
([1]+?)0|$
What will work:
(1+)
Upvotes: 1
Reputation: 296
I think the regex you're looking for is:
1+(?!\0)
i.e. match one or more 1s which aren't followed by a 0.
The one you have is specifically looking for ones that are followed by 0s.
you can play around with regexs on various jsfiddle like sites, with interactive explanations of what they're doing. ex:
https://regex101.com/r/qY4iN9/1
Upvotes: 0