Reputation: 1
I'm trying to parse a text file using perl regular expressions. Here's an example data set:
"Field1", "Field2", "Field3", "Field4", "Field5"
"val1-1", "\\path\to\val1-2.txt", "val1-3", "\\path\to\val1-4.ini", "val1-5.txt"
"val2-1", "val2-2", "\\path\to\val2-3.txt", "\\path\to\val2-4.ini", "val2-5.txt"
"\\path\to\val3-1.txt", "val3-2", "val3-3", "\\path\to\val3-4.ini", "val3-5.txt"
For each line of text, I'm trying to match the first instance of .txt file name; the bolded substrings in the above data set.
I thought this would work:
while(<INFILE>) {
if(m/\\(.*?\.txt)"/) {
print "$1\n";
}
}
Output:
\path\to\val1-2.txt
\path\to\val2-3.txt
\path\to\val3-1.txt
but it doesn't because it will match the complete path, not just the filename.
Now this works:
while(<INFILE>) {
if(my @matches = $_ =~ m/(.*?)"/g) {
foreach (@matches) {
print "$1\n" if(m/.*\\(.*?\.txt)/);
}
}
}
Output:
val1-2.txt
val2-3.txt
val3-1.txt
But I would suppose there must be a way to do this with a single match expression?
Upvotes: 0
Views: 508
Reputation: 54323
Try this one:
while (<DATA>) {
if(m/([^\\]+\.txt)"/) {
print "$1\n";
}
}
__END__
val1-2.txt
val2-3.txt
val3-1.txt
You don't need the \
outside your capture group. Instead, look for everything that's not a backslash instead of just everything. Since you want the file to have a name in front of the .txt
you want the +
quantifier, not the *?
which is match something or nothing but get as few as possible.
Upvotes: 1
Reputation: 91375
How about:
my $re = qr~\\([^\\"]+)"~;
while(<DATA>) {
chomp;
if(my @m = /$re/g) {
say "@m";
}
}
__DATA__
"Field1", "Field2", "Field3", "Field4", "Field5"
"val1-1", "\\path\to\val1-2.txt", "val1-3", "\\path\to\val1-4.ini", "val1-5.txt"
"val2-1", "val2-2", "\\path\to\val2-3.txt", "\\path\to\val2-4.ini", "val2-5.txt"
"\\path\to\val3-1.txt", "val3-2", "val3-3", "\\path\to\val3-4.ini", "val3-5.txt"
output:
val1-2.txt val1-4.ini
val2-3.txt val2-4.ini
val3-1.txt val3-4.ini
If you only want the first .txt, do:
my $re = qr~\\([^\\"]+\.txt)~;
while(<DATA>) {
chomp;
/$re/ && say $1;
}
Upvotes: 1