user290043
user290043

Reputation:

Why don't my Perl regexes correctly extract a filename from a path?

I am trying to parse the filename from paths. I have this:

my $filepath = "/Users/Eric/Documents/foldername/filename.pdf";
$filepath =~ m/^.*\\(.*[.].*)$/;
print "Linux path:";
print $1 . "\n\n";
print "-------\n";

my $filepath = "c:\\Windows\eric\filename.pdf";
$filepath =~ m/^.*\\(.*[.].*)$/;
print "Windows path:";
print $1 . "\n\n";
print "-------\n";

my $filepath = "filename.pdf";
$filepath =~ m/^.*\\(.*[.].*)$/;
print "Without path:";
print $1 . "\n\n";
print "-------\n";

But that returns:

Linux path:

-------
Windows path:Windowsic
                      ilename.pdf

-------
Without path:Windowsic
                      ilename.pdf

-------

I am expecting this:

Linux path:
filename.pdf
-------
Windows path:
filename.pdf
-------
Without path:
filename.pdf
-------

Can somebody please point out what I am doing wrong?

Thanks! :)

Upvotes: 2

Views: 4289

Answers (4)

Axeman
Axeman

Reputation: 29854

Well, the answer to what is happening would be: various errors.

my $filepath = "/Users/Eric/Documents/foldername/filename.pdf";
$filepath =~ m/^.*\\(.*[.].*)$/;
print "Linux path:";
print $1 . "\n\n";
print "-------\n";

$filepath doesn't have any \\s in it, so it won't match and there's no $1. You put /s in it. Your expression would have to be:

# regular expression matches return their captures in a list context.
my ( $path ) = $filepath =~ m|/([^/.]*\.[^/.]*)$|;
print "Linux path:$path\n\n-------\n"; # little need to . a " string

my $filepath = "c:\\Windows\eric\filename.pdf";
$filepath =~ m/^.*\\(.*[.].*)$/;
print "Windows path:";
print $1 . "\n\n";
print "-------\n";

You're using double quotes, which, taking their cue from UNIX shells, are more active than single quote strings. Thus, you need to escape all your backslashes, like this:

my $filepath = "c:\\Windows\\eric\\filename.pdf";

or just use single quotes:

my $filepath = 'c:\Windows\eric\filename.pdf';

Actually, since perl understands '/' for windows, this works too (but not for the regex.)

my $filepath = "c:/Windows/eric/filename.pdf";

As long as you fix it before handing it back to Windows.

my $filepath = "filename.pdf";
$filepath =~ m/^.*\\(.*[.].*)$/;
print "Without path:";
print $1 . "\n\n";
print "-------\n";

This didn't match, so $1 is still the last match. That's why it's repeated. But this points up the value of catching the captures instead of referring to $1.

Upvotes: 2

Telemachus
Telemachus

Reputation: 19725

In this case, as others have said, the mistake is to do it by hand.

In addition to File::Basename, you should take a look at File::Spec and Path::Class. They offer well-tested, cross-platform methods for handling files and directories. Path::Class in particular provides helper methods for dealing with file and directory names that are foreign to the system the script lives on. It looks like that might come in handy here.

#!/usr/bin/env perl
use strict;
use warnings;
use Path::Class qw/file foreign_file/;

my $nix = "/Users/Eric/Documents/foldername/filename.pdf";
my $win = 'c:\\Windows\eric\filename.pdf'; # single quote to avoid escape issues

print file($nix)->basename(), "\n";
print foreign_file('Win32', $win)->basename(), "\n";

Upvotes: 7

AllenJB
AllenJB

Reputation: 1254

Perl provides this capability: http://perldoc.perl.org/File/Basename.html

You also need to be wary of string escapes - your Windows path string is being escaped on '\', '\f' and '\e' - it's been a while since I've dealt with Perl escapes, but I'm guessing the \e is also swallowing the 'r' after it. This explains the unexpected output.

Upvotes: 3

kennytm
kennytm

Reputation: 523774

Why not use File::Basename?

$name = basename($filepath)
print $name

The regex

m/^.*\\(.*[.].*)$/
#    ^^

assumes a separator \, so case 1 and 3 will never match. In case 2,

"c:\\Windows\eric\filename.pdf";

\e and \f are both special characters in Perl. So the code "correctly" returns Windows\eric\filename.pdf as the filename. Remember to use \\!

Upvotes: 4

Related Questions