Sam B
Sam B

Reputation: 27608

perl how to get filename and extension

I have an input file called test1.txt with hundreds & thousands of file names.

test word document.docx
...
...
amazing c. document.docx
1. 2. 3.45 document.docx
...
...

What I want to do is get filename and extension out of the string. For majority of the file names there is only one dot so I am able to get filename and ext using dot as a separator. But the problem is that some file names have multiple dots in the filenames. I have no idea how I would get the extention and file name out of that.

Here's my perl code.

use strict;
use warnings;

print "Perl Starting ... \n\n"; 

open my $input_filehandle1, , '<', 'test1.txt' or die "No input Filename Found test1.txt ... \n";

while (defined(my $recordLine = <$input_filehandle1>))
{
    chomp($recordLine);

    my @fields = split(/\./, $recordLine);
    my $arrayCount = @fields;


    #if the array size is more than 2 then we encountered multiple dots
    if ($arrayCount > 2)
    {
        print "I dont know how to get filename and ext ... $recordLine ... \n";
    }
    else
    {   
        print "FileName: $fields[0] ... Ext: $fields[1] ... \n";
    }

}#end while-loop

print "\nPerl End ... \n\n"; 

1;

Here's the output:

Perl Starting ...

FileName: test word document ... Ext: docx ...
I dont know how to get filename and ext ... amazing c. document.docx ...
I dont know how to get filename and ext ... 1. 2. 3.45 document.docx ...

Perl End ...

What I would like to get

FileName: test word document ... Ext: docx ...
FileName: amazing c. document ... Ext: docx ...
FileName: 1. 2. 3.45 document ... Ext: docx ...

Upvotes: 2

Views: 3558

Answers (2)

Dave Cross
Dave Cross

Reputation: 69294

This is what File::Basename is for.

#!/usr/bin/perl

use strict;
use warnings;
use feature 'say';

use File::Basename;

while (<DATA>) {
  chomp;
  my ($name, undef, $ext) = fileparse($_, '.docx');

  say "Filename: $name ... Ext: $ext";
}

__DATA__
test word document.docx
amazing c. document.docx
1. 2. 3.45 document.docx

Three things that are worth explaining.

  1. I use the DATA filehandle as this is a demonstration and it's easier than having a separate input file.
  2. fileparse() returns the directory path as the second value. As this data doesn't include directory paths, I've ignored that value (by assigning it to undef).
  3. The second (and subsequent) parameters to fileparse() are a list of extensions to separate out. You only use one extension in your sample data. If you had more extensions, you could just add them after ".docx".

Upvotes: 5

ceving
ceving

Reputation: 23866

Don't use split.

Use just a regular pattern match:

#! /usr/bin/perl
use strict;
use warnings;

print "Perl Starting ... \n\n"; 

open my $input_filehandle1, , '<', 'test1.txt' or die "No input Filename Found test1.txt ... \n";

while (defined(my $recordLine = <$input_filehandle1>))
{
    chomp($recordLine);

    if ($recordLine =~ /^(.*)\.([^.]+)$/) {
      print "FileName: $1 ... Ext: $2 ... \n";
    }

}#end while-loop

print "\nPerl End ... \n\n"; 

1;

Regexper explains the regular expression.

Upvotes: 4

Related Questions