Reputation: 1032
Im writing a small script which is supposed to match all strings within another file (words in between "" and '' including the "" and '' symbols as well).
Below is the regex statement i am currently using, however it only produces results for '(.*)'
and not "(.*)"
my @string_matches = ($file_string =~ /'(.*)' | "(.*)"/g);
print "\n@string_matches";
Also how would I be able to include the "" or '' symbols in the results as well?(print out "string" instead of just string) I've tried searching online but couldnt find any material on this
$file_string is basically a string version of an entire file.
Upvotes: 0
Views: 5914
Reputation: 15121
You could use '[^']*'
to match a string between single quotes, "[^"]*"
for double quotes.
If you want to support other features, such as escape sequence, then you should consider using modules Text::ParseWords or Text::Balanced.
Note:
Because of the greediness of *
, '.*'
will match all characters between the first and last single quote, if your string has more than one single quoted substrings, this will only give one match instead of several ones.
You can use ('[^']*')
instead of '([^']*)'
to capture the single quotes and the substring between them, double quotes are similar.
Because '[^']*'
and "[^"]*"
cannot be matched at the same time, m/('[^']*')|("[^"]*")/
with /g
will give some undef
s in the returned list in list context, using m/('[^']*'|"[^"]*")/g
can fix this problem.
Here is a test program:
#!/usr/bin/perl
use strict;
use warnings;
use feature qw(switch say);
use Data::Dumper;
my $file_string = q{Test "test in double quotes" test 'test in single quotes' and "test in double quotes again" test};
my @string_matches = ($file_string =~ /('[^']*'|"[^"]*")/g);
local $" = "\n";
print "@string_matches\n";
Testing:
$ perl t.pl
"test in double quotes"
'test in single quotes'
"test in double quotes again"
Upvotes: 0
Reputation: 98881
#!/usr/local/bin/perl
open my $fh, '<', "strings.txt"; #read the content of the file and assign it to $string;
read $fh, my $string, -s $fh;
close $fh;
while ($string =~ m/^['"]{1}(.*?)['"]{1,}$/mg) {
print $&;
}
Upvotes: 0
Reputation: 11116
use this : '(.*?)' | "(.*?)"
i guess the greedy operator is selecting your string upto the last '
. make it lazy
IMHO use this regex :
['"][^'"]*?['"]
this will also solve your problem of not getting the quotes inside the match.
demo here : http://regex101.com/r/dI6gD7
Upvotes: 1