Reputation: 27608
I have an input file called test1.txt with hundreds & thousands of file names.
test word document.docx
...
...
amazing c. document.docx
1. 2. 3.45 document.docx
...
...
What I want to do is get filename and extension out of the string. For majority of the file names there is only one dot so I am able to get filename and ext using dot as a separator. But the problem is that some file names have multiple dots in the filenames. I have no idea how I would get the extention and file name out of that.
Here's my perl code.
use strict;
use warnings;
print "Perl Starting ... \n\n";
open my $input_filehandle1, , '<', 'test1.txt' or die "No input Filename Found test1.txt ... \n";
while (defined(my $recordLine = <$input_filehandle1>))
{
chomp($recordLine);
my @fields = split(/\./, $recordLine);
my $arrayCount = @fields;
#if the array size is more than 2 then we encountered multiple dots
if ($arrayCount > 2)
{
print "I dont know how to get filename and ext ... $recordLine ... \n";
}
else
{
print "FileName: $fields[0] ... Ext: $fields[1] ... \n";
}
}#end while-loop
print "\nPerl End ... \n\n";
1;
Here's the output:
Perl Starting ...
FileName: test word document ... Ext: docx ...
I dont know how to get filename and ext ... amazing c. document.docx ...
I dont know how to get filename and ext ... 1. 2. 3.45 document.docx ...
Perl End ...
What I would like to get
FileName: test word document ... Ext: docx ...
FileName: amazing c. document ... Ext: docx ...
FileName: 1. 2. 3.45 document ... Ext: docx ...
Upvotes: 2
Views: 3558
Reputation: 69294
This is what File::Basename is for.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
use File::Basename;
while (<DATA>) {
chomp;
my ($name, undef, $ext) = fileparse($_, '.docx');
say "Filename: $name ... Ext: $ext";
}
__DATA__
test word document.docx
amazing c. document.docx
1. 2. 3.45 document.docx
Three things that are worth explaining.
DATA
filehandle as this is a demonstration and it's easier than having a separate input file.fileparse()
returns the directory path as the second value. As this data doesn't include directory paths, I've ignored that value (by assigning it to undef
).fileparse()
are a list of extensions to separate out. You only use one extension in your sample data. If you had more extensions, you could just add them after ".docx".Upvotes: 5
Reputation: 23866
Don't use split
.
Use just a regular pattern match:
#! /usr/bin/perl
use strict;
use warnings;
print "Perl Starting ... \n\n";
open my $input_filehandle1, , '<', 'test1.txt' or die "No input Filename Found test1.txt ... \n";
while (defined(my $recordLine = <$input_filehandle1>))
{
chomp($recordLine);
if ($recordLine =~ /^(.*)\.([^.]+)$/) {
print "FileName: $1 ... Ext: $2 ... \n";
}
}#end while-loop
print "\nPerl End ... \n\n";
1;
Regexper explains the regular expression.
Upvotes: 4