user749632
user749632

Reputation: 1989

match multiple patterns and extract subpatterns into a array in perl

I have the following string in $str:

assign (rregbus_z_partially_resident | regbus_s_partially_resident | reg_two | )regbus_;

I want to parse this line and only capture all the string that starts with non-word character followed by either reg_\w+ or regbus_\w+ into an array.

so in the above example, i want to capture only regbus_s_partially_resident and reg_two into a array.

I tried this and it didnot work:

my (@all_matches) = ($str =~ m/\W(reg_\w+)|\W(regbus_\w+)/g);

Since i am trying to use \W, its copying the non-word character also into the array list, which i donot want.

Upvotes: 1

Views: 5519

Answers (2)

zdim
zdim

Reputation: 66873

Need a little tweak to your regex

my @all_matches = $str =~ m/\W(reg_\w+|regbus_\w+)/g;

or

my @all_matches = $str =~ m/\W( (?:reg|regbus)_\w+ )/gx;

or even something along the lines of

my @all_matches = $str =~ m/\W( reg(?:bus)?_\w+ )/gx;

The most suitable form depends on what patterns you may need and how this is used.

Or, reduce the regex use to the heart of the problem

my @matches = grep { /^(?:reg_\w+|regbus_\w+)/ } split /\W/, $str;

what may be helpful if your strings and/or requirements grow more complex.

Upvotes: 2

ikegami
ikegami

Reputation: 385506

its copying the non-word character also into the array list

No, it doesn't.

$ perl -le'
   my $str = "assign (rregbus_z_partially_resident | regbus_s_partially_resident | reg_two | )regbus_;";
   my (@all_matches) = ($str =~ m/\W(reg_\w+)|\W(regbus_\w+)/g);
   print $_ // "[undef]" for @all_matches;
'
[undef]
regbus_s_partially_resident
reg_two
[undef]

But you do have a problem: You have two captures, so you will get two values per match.

Fix:

my @all_matches;
push @all_matches, $1 // $2 while $str =~ m/\W(reg_\w+)|\W(regbus_\w+)/g;

Far better:

my @all_matches = $str =~ m/\W(reg(?:bus)?_\w+)/g;

Ever better yet:

my @all_matches = $str =~ m/\b(reg(?:bus)?_\w+)/g;

Upvotes: 2

Related Questions