ansario
ansario

Reputation: 345

Regex for unsanitized data - Perl

I have some unsanitized data which I need to split into an array using perl. Ideally, I would have a sequence of values separated by commas. In this case, I would use the following to split the data:

/,\s*/

Unfortunately this is a bit of a special case. Here is an example of the data I have:

Cat Bag
Dog Hair
Turkey brown Caller
Thirteen,BoyXbox
Mac
LizardDinosaur 

The final array should be:

[Cat Bag, Dog Hair, Turkey brown Caller, Thirteen, Boy, Xbox, Mac, Lizard, Dinosaur]

As you can see, I need to split on newline characters, commas, and if there are two words next to each other with no space (eg: BoyXbox).

Thanks!

Upvotes: 0

Views: 76

Answers (1)

Borodin
Borodin

Reputation: 126742

This is pretty much a literal implementation of the requirement

use strict;
use warnings;
use 5.010;

my $s = <<END_STRING;
Cat Bag
Dog Hair
Turkey brown Caller
Thirteen,BoyXbox
Mac
LizardDinosaur
JRAinsley-McEwan Class1C
END_STRING

my @s = split/\s*[\n,]\s*|(?<=\S)(?=[A-Z])/, $s;

say join ', ', map qq{"$_"}, @s;

output

"Cat Bag", "Dog Hair", "Turkey brown Caller", "Thirteen", "Boy", "Xbox", "Mac", "Lizard", "Dinosaur", "J", "R", "Ainsley-", "Mc", "Ewan Class1", "C"

Upvotes: 1

Related Questions