Reputation: 109
I am trying to develop a script that will look through project directory structure and source files mostly java and xml files for file names that maybe in the wrong case in the code to how it is named on the directory/drive where the file is located. eg ABC.xml in code Abc.xml on the drive. this problem was found as we are migrating from windows to linux.
I had originally thought of using ACK but the firewall here appears to block CPAN and it kept failing to install manually on my computer using dmake. (using latest version of strawberry)
Here is what I have been able to put together so far, it recursively searches each sub directory under the base path getting java and xml files. it would then open each file found and do a case insensitive search for each name in the sourcelist, then it would do case sensitive match on the search results to remove results where the case is the same and then store the negative results in a hash of each source file(key) with an array (value) storing name of each file where the case is found to not match the filename. at the end I plan to print out the hash.
Im currently having difficulty with setting up the hash of arrays, but I'm open to alternative/simpler solutions.
my $source = "C:/sampleSourcefiles";
my $base_path = "C:/baseDIRprojectCode";
my @searchList;
my %report;
#open source file directory.
if($source){
opendir (DIR, $source) or die "Directory not found \n" ;
@searchList = grep(/^.+\..+$/, readdir(DIR));
closedir DIR;
}
#code does not have file extensions trim from names
foreach my $file (@searchList){
$file =~ s/\.dat|xml$//;
#print "$file\n";
}
process_files ($base_path);
# Accepts one argument: the full path to a directory.
sub process_files {
my $path = shift;
# Open the directory.
opendir (DIR, $path) or die "Unable to open $path: $!";
# Read in the files.
my @files = grep {!/^\./} readdir (DIR);
closedir (DIR);
# append the full path to the file names.
@files = map { $path . '/' . $_ } @files;
for (@files) {
# If the file is a directory
if (-d $_) {
process_files ($_);
# If it isn't a directory, process the file.
} else {
file_search($_);
}
}
}
# Accepts one argument: the source file to search
sub file_search {
my $file = shift;
#ignore all files not java or xml
if ($file =~ /\.xml|java$/){
#search for match to any file in the list
foreach my $item (@searchList){
open(F, $file);
my @lines = <F>;
close F;
my @result = grep /$item/i , @lines;
if (@result){
%report($item, @result);
#foreach my $res (@result){
# if($res eq $file){
# print "good result\n";
# } else {
# print "Inequality match found in file $file for $res\n";
# }
#}
} else {
}
}
}
}
Upvotes: 1
Views: 111
Reputation: 8591
You are well on your way, but you can improve.
First of all: the line
%report($item, @result);
does not make any sense; shouldn't it just be a subroutine call?
report($item, @result);
Second, what do you want to use hashes for?
Third: you're not iterating very efficiently. Why reopen and reread a file for each filename?
It is more efficient to get the list of files first, map their lowercase form to their original form
my %lower2original = map { (lc($_), $_) } @files;
then build one big regular expression that searches for any of them case-insensitively, using the qr
operator: something like
my $regex = '\b(' . join('|', @files), ')\b';
$regex = qr/$regex/ip;
and then open each file in turn and scan through it using
while (my ($match) = /$regex/g)
{
my $original = $lower2original{lc($match)};
if ($match ne $original)
{
print "case mismatch: line $. of $file has $match instead of $original\n";
}
}
Fourth: I'd use File::Find::Rule to obtain the list of files.
Upvotes: 0