Reputation: 11
I have developed the below shell script for finding duplicate part of filename and delete it. Similiarly i need to prepare in Perl script also as perl script text processing takes lesser time.
#!/bin/bash
for i in `ls -t *xml|awk 'BEGIN{FS="_"}{if (++dup[$1] >= 2) print}'`;
do
rm $i
done
I have to prepare my perl script my code in such a way that only recent modified filename patterns for example
File 1: AAA_555_0000
File 2: AAAA_123_123
File 3: AAAA_452_452 [latest]
File 4: BBB_555_0000
File 5: BBB_555_555
File 6: BBB_999_999 [latest]
File 7: CCC_555_0000
File 8: CCC_000_000
File 9: CCC_000_111 [latest]
Perl Script has to pick latest file in all the filename patterns (means part of filename) in the folder and it should compare and delete the duplicates. For example: script has to pick latest file in the AAA filename pattern and it has to compare with other AAA pattern if pattern found it has to delete. in the end only latest file in all filename pattern has to remain.
Appreciate if you can help me with this logic.
Thanks much!
Upvotes: 0
Views: 83
Reputation: 2154
Your question is a bit unclear to me, because I am not sure how you determine the order of your files to tell which one is newer. This code will do what I think you need:
my $dir = shift || '.';
opendir(my $dh, $dir);
my @files = sort grep !/^\./, readdir($dh);
my $last;
my @batch;
foreach my $f (@files) {
my @parts = split /_/, $f;
if( !$last ) {
$last = $parts[0];
push @batch, [ @parts ];
}
elsif( index($last, $parts[0]) != -1 ) {
push @batch, [ @parts ];
}
else {
delete_files(@batch);
@batch = ([ @parts ]);
$last = $parts[0];
}
}
delete_files(@batch);
sub delete_files {
my @batch = @_;
@batch = sort {
$a->[0] cmp $b->[0] ||
$a->[1] cmp $b->[1] ||
$a->[2] cmp $b->[2]
} @batch;
pop @batch;
map { print "Delete: ", join('_', @$_), "\n"; } @batch;
}
It assumes that a batch of files is one in which files have the same prefix (up to the first '_' character). When two prefixes have a different length, then the common length has to match.
It also assumes that "version numbers" (text bits separated by '_') should be compared, with the leftmost being the most significative.
Given those assumptions, the code, pointed at a directory with the files you mention, outputs:
# latest.pl <dir>
Delete: AAA_555_0000
Delete: AAAA_123_123
Delete: BBB_555_0000
Delete: BBB_555_555
Delete: CCC_000_000
Delete: CCC_000_111
The unclear part is why you think that file 7 is not the latest...
Once the order is clear, you may change the line:
map { print "Delete: ", join('_', @$_), "\n"; } @batch;
With:
map { unlink join('_', @$_); } @batch;
So that it will delete the files.
You may play with the sorting algorthm in the sub
, which is the one that determines which files to delete. Now it compares the file name parts from left to right as strings. You may use <=>
to compare as numbers instead of cmp
where necessary.
Upvotes: 1