Reputation: 52191
I have a binary number, for example 10000111000011
, and want to split it into groups of consecutive 1s and 0s, 1 0000 111 0000 11
.
I thought that's a great opportunity to use look-arounds: my regex uses a positive look-behind for a digit (which it captures for later backreferencing), then a negative look-ahead for that same digit (using a backreference), so I should get a split whenever a digit is followed by a digit that is not the same.
use strict;
use warnings;
use feature 'say';
my $bin_string = '10000111000011';
my @groups = split /(?<=(\d))(?!\g1)/, $bin_string;
say "@groups";
However, this results in
1 1 0000 0 111 1 0000 0 11 1
Somehow, the captured digit is inserted at every split. What did go wrong?
Upvotes: 3
Views: 1718
Reputation: 67968
(?<=0)(?=1)|(?<=1)(?=0)
Simply split by this.See demo.
https://regex101.com/r/fM9lY3/3
The lookarounds
will find place where there is 0
behind and 1
ahead or 1
behind and 0
ahead.Thus resulting in correct split without consuming anything.
Upvotes: 1
Reputation: 626961
Here is a small fix for your code:
my @groups = split /(?<=0(?!0)|1(?!1))/, $bin_string;
The problem you experience is that when using split
captured texts are also output in the resulting array. So, the solution is to get rid of the capturing group.
Since you only have 0
or 1
in your input, it is pretty easy with an alternation and a lookahead making sure the digits get changed.
See demo
Upvotes: 3
Reputation: 174736
Just do matching instead of splitting.
(\d)\1*
Example:
use strict;
use warnings;
use feature 'say';
my $bin_string = '10000111000011';
while($bin_string =~ m/((\d)\2*)/g) {
print "$1\n";
}
Upvotes: 1