Reputation: 437
I have a string which consists of several words (separated by Capital letter).
For example:
$string1="TestWater"; # to be splited in an array @string1=("Test","Water")
$string2="TodayIsNiceDay"; # as @string2=("Today","Is","Nice","Day")
$string3="EODIsAlwaysGood"; # as @string3=("EOD","Is","Always","Good")
I know that Perl easily split uses the split function for fixed character, or the match regex can separate $1, $2 with fixed amount of variable. But how can this be done dynamically? Thanks in advance!
That post Spliting CamelCase doesn't answer my question, my question is more related to regex in Perl, that one was in Java (differences apply here).
Upvotes: 1
Views: 1796
Reputation: 85767
You can do this by using m//g
in list context, which returns a list of all matches found. (Rule of thumb: Use m//g
if you know what you want to extract; use split
if you know what you want to throw away.)
Your case is a bit more complicated because you want to split "EODIs"
into ("EOD", "Is")
.
The following code handles this case:
my @words = $string =~ /\p{Lu}(?:\p{Lu}+(?!\p{Ll})|\p{Ll}*)/g;
I.e. every word starts with an uppercase letter (\p{Lu}
) and is followed by either
\p{Ll}
)Upvotes: 4
Reputation: 241768
Use split to split a string on a regex. What you want is an upper case character not followed by an upper case character as the boundary, which can be expressed by two look-ahead assertions (perlre for details):
#!/usr/bin/perl
use warnings;
use strict;
use Test::More;
sub split_on_capital {
my ($string) = @_;
return [ split /(?=[[:upper:]](?![[:upper:]]))/, $string ]
}
is_deeply split_on_capital('TestWater'), [ 'Test', 'Water' ];
is_deeply split_on_capital('TodayIsNiceDay'), [ 'Today', 'Is', 'Nice', 'Day' ];
is_deeply split_on_capital('EODIsAlwaysGood'), [ 'EOD', 'Is', 'Always', 'Good' ];
done_testing();
Upvotes: 7