Eirik Birkeland
Eirik Birkeland

Reputation: 608

Split string where the character changes

I want to create an array that splits whenever a character stops repeating. My current code is:

my $str = "1233345abcdde";
print "$_," for split /(?<=(.))(?!\1)/, $str;

This returns: 1,1,2,2,333,3,4,4,5,5,a,a,b,b,c,c,dd,d,e,e,

However, what I really want is: 1,2,333,4,5,a,b,c,dd,e,, i.e. without the duplicate chars.

What's wrong? I suspect the problem has to do with the nature of lookarounds, but I can't pin it down ...

Upvotes: 1

Views: 477

Answers (4)

ikegami
ikegami

Reputation: 385575

When you use captures, they text they capture gets returned too. You could filter out these extra values.

my $i; my @matches = grep { ++$i % 2 } split /(?<=(.))(?!\1)/s, $str;

use List::Util qw( pairkeys );  # 1.29+
my @matches = pairkeys split /(?<=(.))(?!\1)/s, $str;

It's simpler to use a regex match.

my @matches; push @matches, $1 while $str =~ /((.)\2*)/sg;

my $i; my @matches = grep { ++$i % 2 } $str =~ /((.)\2*)/sg;

use List::Util qw( pairkeys );  # 1.29+
my @matches = pairkeys $str =~ /((.)\2*)/sg;

Upvotes: 1

Aristotle Pagaltzis
Aristotle Pagaltzis

Reputation: 117939

This will do what you want, but you almost certainly shouldn’t use it:

split /(??{ (substr $_, (pos)-1, 1) eq (substr $_, pos, 1) ? '(?!)' : '' })/, $str

Upvotes: 2

Sobrique
Sobrique

Reputation: 53478

No the problem is because you're using capture groups in split - which returns the "capture" along with the "split".

use Data::Dumper;
my @stuff = split /(=)/, "this=that";
print Dumper \@stuff;

Gives:

$VAR1 = [
          'this',
          '=',
          'that'
        ];

Unfortunately it's not easy to 'fix' - best I could come up with is skip the odd numbered elements:

my %stuff =  split /(?<=(.))(?!\1)/, $str;
print Dumper \%stuff;

(That won't preserve ordering though, because hashes don't).

But you can:

print join (",", sort keys %stuff);

Or perhaps:

my $str = "1233345abcdde";
my @stuff =  split /(?<=(.))(?!\1)/, $str;
print join ( ",", @stuff[grep { not $_ & 1 } 0..$#stuff] ),"\n";

Upvotes: 2

LeoNerd
LeoNerd

Reputation: 8532

When the split regexp includes a capture group, the return list also includes the values from these captures. You'll have to filter them out somehow.

You want the same as the answer:

Perl split function - use repeating characters as delimiter

Upvotes: 1

Related Questions