Reputation: 440

How to check for files that has two different extensions in Perl

I have a file reflog with the content as below. There will be items with same name but different extensions. I want to check that for each of the items (file1, file2 & file3 here as example), it needs to be exist in both extensions (.abc and .def). If both extensions exist, it will perform some regex and print out. Else it will just report out with the file name together with extension (ie, if only on of file1.abc or file1.def exists, it will be printed out).

reflog:

file1.abc


file2.abc

file2.def 

file3.abc
file3.def

file4.abc 

file5.abc 
file5.def
file6.def
file8abc.def
file7.abc

file1.def
file9abc.def
file10def.abc

My script is as below (editted from yb007 script), but I have some issues with the output that I don;t know how to resolve. I notice the output is going to be wrong when the reflog file having any file with the name *abc.def (such as ie. file8abc.def & file9abc.def). It will be trim down the last 4 suffix and return the wrong .ext (which is .abc here but I suppose it should be .def).

    #! /usr/bin/perl 
    use strict; 
    use warnings; 
    my @files_abc ;
    my @files_def ;
    my $line;
    open(FILE1, 'reflog') || die ("Could not open reflog") ;
    open (FILE2, '>log') || die ("Could not open log") ;
    while ($line = <FILE1>) {   
        if($line=~ /(.*).abc/) {       
            push(@files_abc,$1);   
        } elsif ($line=~ /(.*).def/) { 
            push(@files_def,$1);     } 
    } 
    close(FILE1);

    my %first = map { $_ => 1 } @files_def ;
    my @same = grep { $first{$_} } @files_abc ;
    my @abc_only = grep { !$first{$_} } @files_abc ;
    foreach my $abc (sort @abc_only) {
        $abc .= ".abc";
    }   

    my %second = map {$_=>1} @files_abc; 
    my @same2 = grep { $second{$_} } @files_def; #@same and same2 are equal.
    my @def_only = grep { !$second{$_} } @files_def;
    foreach my $def (sort @def_only) {
        $def .= ".def";
    }

    my @combine_all = sort (@same, @abc_only, @def_only);
    print "\nCombine all:-\n @combine_all\n" ;
    print "\nList of files with same extension\n @same";  
    print "\nList of files with abc only\n @abc_only"; 
    print "\nList of files with def only\n @def_only"; 
    foreach my $item (sort @combine_all) {
        print FILE2 "$item\n" ;
    }
    close (FILE2) ;

My output is like this which is wrong:- 1st:- print screen output as below: Combine all:- file.abc file.abc file1 file10def.abc file2 file3 file4.abc file5 file6.def file7.abc

List of files with same extension
 file1 file2 file3 file5
List of files with abc only
 file4.abc file.abc file7.abc file.abc file10def.abc
List of files with def only
 file6.def

Log output as below:
    **file.abc
    file.abc**
    file1
    file10def.abc
    file2
    file3
    file4.abc
    file5
    file6.def
    file7.abc

Can you pls help me take a look where gies wrong? Thanks heaps.

Upvotes: 1

Answers (5)

yb007

Reputation: 1377

use strict;
use warnings;

my @files_abc;
my @files_def;
my $line;

open(FILE,'reflog') || die ("could not open reflog");

while ($line = <FILE>) {
    if($line=~ /(.*)\.abc/) {
        push(@files_abc,$1);
    }
    elsif($line=~ /(.*)\.def/) {
        push(@files_def,$1);
    }
}

close(FILE);

my %second = map {$_=>1} @files_def;
my @same = grep { $second{$_} } @files_abc;

print "\nList of files with same extension\n @same";


foreach my $abc (@files_abc) {
           $abc .= ".abc";
         }
foreach my $def (@files_def) {
           $def .= ".def";
         }

print "\nList of files with abc extension\n @files_abc";
print "\nList of files with def extension\n @files_def";

Output is

List of files with same extension
file1 file2 file3 file5

List of files with abc extension
file1.abc file2.abc file3.abc file4.abc file5.abc file7.abc file10def.abc

List of files with def extension
file2.def file3.def file5.def file6.def file8abc.def file1.def file9abc.def

Hope this helps...

Upvotes: 0

Jonathan Leffler

Reputation: 755094

You don't need to slurp the whole file; you can read one line at a time. I think this code works on this extended version of your reflog file:

xx.pl

#!/usr/bin/env perl

use strict;
use warnings;

open my $file, '<', "reflog" or die "Failed to open file reflog for reading ($!)";
open my $func, '>', 'log'    or die "Failed to create file log for writing ($!)";

my ($oldline, $oldname, $oldextn) = ("", "", "");
while (my $newline = <$file>)
{
    chomp $newline;
    $newline =~ s/^\s*//;
    my ($newname, $newextn) = ($newline =~ m/(.*)([.][^.]*)$/);
    if ($oldname eq $newname)
    {
        # Found the same file - presumably $oldextn eq ".abc" and $newextn eq ".def"
        print $func "$newname\n";
        print "$newname\n";
        $oldline = "";
        $oldname = "";
        $oldextn = "";
    }
    else
    {
        print $func "$oldline\n" if ($oldline);
        print "$oldline\n" if ($oldline);
        $oldline = $newline;
        $oldname = $newname;
        $oldextn = $newextn;
    }
}
print $func "$oldline\n" if ($oldline);
print "$oldline\n" if ($oldline);

#unlink "reflog" ;
chmod 0644, "log";
close $func;
close $file;

Since the code does not actually check the extensions, it would be feasible to omit $oldextn and $newextn; on the other hand, you might well want to check the extensions if you're sufficiently worried about the input format to need to deal with leading white space.

I very seldom find it good for a processing script like this to remove its own input, hence I've left unlink "reflog"; commented out; your mileage may vary. I would also often just read from standard input and write to standard output; that would simplify the code quite a bit. This code writes to both the log file and to standard output; obviously, you can omit either output stream. I was too lazy to write a function to handle the writing, so the print statements come in pairs.

This is a variant on control-break reporting.

reflog

file1.abc
file1.def
file2.abc
file2.def
file3.abc
file3.def
file4.abc
file5.abc
file5.def
file6.def
file7.abc

Output

$ perl xx.pl
file1
file2
file3
file4.abc
file5
file6.def
file7.abc
$ cat log
file1
file2
file3
file4.abc
file5
file6.def
file7.abc
$

To handle unsorted file names with blank lines

#!/usr/bin/env perl

use strict;
use warnings;

open my $file, '<', "reflog" or die "Failed to open file reflog for reading ($!)";
open my $func, '>', 'log'    or die "Failed to create file log for writing ($!)";

my @lines;

while (<$file>)
{
    chomp;
    next if m/^\s*$/;
    push @lines, $_;
}

@lines = sort @lines;

my ($oldline, $oldname, $oldextn) = ("", "", "");
foreach my $newline (@lines)
{
    chomp $newline;
    $newline =~ s/^\s*//;
    my ($newname, $newextn) = ($newline =~ m/(.*)([.][^.]*)$/);
    if ($oldname eq $newname)
    {
        # Found the same file - presumably $oldextn eq ".abc" and $newextn eq ".def"
        print $func "$newname\n";
        print "$newname\n";
        $oldline = "";
        $oldname = "";
        $oldextn = "";
    }
    else
    {
        print $func "$oldline\n" if ($oldline);
        print "$oldline\n" if ($oldline);
        $oldline = $newline;
        $oldname = $newname;
        $oldextn = $newextn;
    }
}
print $func "$oldline\n" if ($oldline);
print "$oldline\n" if ($oldline);

#unlink "reflog" ;
chmod 0644, "log";
close $func;
close $file;

This is very similar to the original code I posted. The new lines are these:

my @lines;

while (<$file>)
{
    chomp;
    next if m/^\s*$/;
    push @lines, $_;
}

@lines = sort @lines;

my ($oldline, $oldname, $oldextn) = ("", "", "");    # Old
foreach my $newline (@lines)

This reads the 'reflog' file, skipping blank lines, saving the rest in the @lines array. When the lines are all read, they're sorted. Then, instead of a loop reading from the file, the new code reads entries from the sorted array of lines. The rest of the processing is as before. For your described input file, the output is:

file1
file2
file3

Urgh: the chomp $newline; is not needed, though it is not otherwise harmful. The old-fashioned chop (a precursor to chomp) would have been dangerous. Score one for modern Perl.

Upvotes: 0

Pradeep

Reputation: 3153

open( FILE, "reflog" );
open( FUNC, '>log' );
my %seen;
while ( chomp( my $line = <FILE> ) ) {
    $line =~ s/^\s*//;
    if ( $ine =~ /(\.+)\.(abc|def)$/ ) {
        $seen{$1}++;
    }
}

foreach my $file ( keys %seen ) {
    if ( $seen{$file} > 1 ) {
        ## do whatever you want to
    }
}
unlink "reflog";
chmod( 0750, "log" );
close(FUNC);
close(FILE);

Upvotes: -1

scotinus

Reputation: 176

Aside from the errors already pointed out, you appear to be loading @lines from FUNC instead of FILE. Is that also a typo?

Also, If reflog truly contains a series of lines with one filename on each line, why would you ever expect the conditional "if ($line =~ /.abc/ && $line =~ /.def/)" to evaluate true?

It would really help if you could post an example from the actual file you are reading from, along with the actual code you are debugging. Or at least edit the question to fix the typos already mentioned

Upvotes: 1

Borodin

Reputation: 126772

ALWAYS add

use strict;
use warnings;

to the head of your program. They will catch most simple errors before you need to ask for help.

You should always check whether a file open succeeded with open FILE, "reflog" or die $!;
You are using a variable $ine that doesn't exist. You mean $line
The lines you read into the array contain a trailing newline. Write chomp @lines; to remove them
Your regular expressions are wrong and you need || instead of &&. Instead write if ($line =~ /\.(iif|isp)$/)

If you still have problems when these are fixed then please ask again.

Upvotes: 4