Reputation: 141
I am reading in another perl file and trying to find all strings surrounded by quotations within the file, single or multiline. I've matched all the single lines fine but I can't match the mulitlines without printing the entire line out, when I just want the string itself. For example, heres a snippet of what I'm reading in:
#!/usr/bin/env perl
use warnings;
use strict;
# assign variable
my $string = 'Hello World!';
my $string4 = "chmod";
my $string3 = "This is a fun
multiple line string, please match";
so the output I'd like is
'Hello World!';
"chmod";
"This is a fun multiple line string, please match";
but I am getting:
'Hello World!';
my $string4 = "chmod";
my $string3 = "This is a fun
multiple line string, please match";
This is the code I am using to find the strings - all file content is stored in @contents:
my @strings_found = ();
my $line;
for(@contents) {
$line .= $_;
}
if($line =~ /(['"](.?)*["'])/s) {
push @strings_found,$1;
}
print @strings_found;
I am guessing I am only getting 'Hello World!'; correctly because I am using the $1 but I am not sure how else to find the others without looping line by line, which I would think would make it hard to find the multi line string as it doesn't know what the next line is.
I know my regex is reasonably basic and doesn't account for some caveats but I just wanted to get the basic catch most regex working before moving on to more complex situations.
Any pointers as to where I am going wrong?
Upvotes: 1
Views: 15106
Reputation: 35208
Couple big things, you need to search in a while
loop with the g
modifier on your regex. And you also need to turn off greedy matching for what's inside the quotes by using .*?
.
use strict;
use warnings;
my $contents = do {local $/; <DATA>};
my @strings_found = ();
while ($contents =~ /(['"](.*?)["'])/sg) {
push @strings_found, $1;
}
print "$_\n" for @strings_found;
__DATA__
#!/usr/bin/env perl
use warnings;
use strict;
# assign variable
my $string = 'Hello World!';
my $string4 = "chmod";
my $string3 = "This is a fun
multiple line string, please match";
Outputs
'Hello World!'
"chmod"
"This is a fun
multiple line string, please match"
You aren't the first person to search for help with this homework problem. Here's a more detailed answer I gave to ... well ... you ;) finding words surround by quotations perl
Upvotes: 5
Reputation: 4398
regexp matching (in perl and generally) are greedy by default. So your regexp will match from 1st ' or " to last. Print the length of your @strings_found array. I think it will always be just 1 with the code you have.
Change it to be not greedy by following * with a ? /('"*?["'])/s I think.
It will work in a basic way. Regexps are kindof the wrong way to do this if you want a robust solution. You would want to write parsing code instead for that. If you have different quotes inside a string then greedy will give you the 1 biggest string. Non greedy will give you the smallest strings not caring if start or end quote are different.
Read about greedy and non greedy. Also note the /m multiline modifier. http://perldoc.perl.org/perlre.html#Regular-Expressions
Upvotes: 1