Reputation: 345
I have some unsanitized data which I need to split into an array using perl. Ideally, I would have a sequence of values separated by commas. In this case, I would use the following to split the data:
/,\s*/
Unfortunately this is a bit of a special case. Here is an example of the data I have:
Cat Bag
Dog Hair
Turkey brown Caller
Thirteen,BoyXbox
Mac
LizardDinosaur
The final array should be:
[Cat Bag, Dog Hair, Turkey brown Caller, Thirteen, Boy, Xbox, Mac, Lizard, Dinosaur]
As you can see, I need to split on newline characters, commas, and if there are two words next to each other with no space (eg: BoyXbox).
Thanks!
Upvotes: 0
Views: 76
Reputation: 126742
This is pretty much a literal implementation of the requirement
use strict;
use warnings;
use 5.010;
my $s = <<END_STRING;
Cat Bag
Dog Hair
Turkey brown Caller
Thirteen,BoyXbox
Mac
LizardDinosaur
JRAinsley-McEwan Class1C
END_STRING
my @s = split/\s*[\n,]\s*|(?<=\S)(?=[A-Z])/, $s;
say join ', ', map qq{"$_"}, @s;
output
"Cat Bag", "Dog Hair", "Turkey brown Caller", "Thirteen", "Boy", "Xbox", "Mac", "Lizard", "Dinosaur", "J", "R", "Ainsley-", "Mc", "Ewan Class1", "C"
Upvotes: 1