Reputation: 85
I am currently in my second year of college, therefore my programming skills and knowledge are not as strong as I like them to be. I am doing an internship for a web development company during my summer break and I am completely stomped on the first task that was assigned to me. That's why I'm here asking for some assistance.
In a main folder there are many sub-folders and within each sub-folder there are many .js .cs and .php files - about 1000 files. But about 300 are not being used. I need to open up each of the sub-folders and see if any of these files are used/called by any other files. If they are not, I need to store the location of the unused file in a text file.
I did some research and found out that the command grep -r filename *
does just that, but on the command-line I cannot figure out how to loop through the folders and change the filename based on the content inside the folders. The workstation I have is in Windows with Cygwin installed.
Upvotes: 2
Views: 178
Reputation: 107080
Doesn't this require a double loop? (Big O2). You have to search each file for every instance of the file in it.
I'd use Perl instead of Awk or BASH (although it is possible to do in BASH).
#! /usr/bin/env perl
use warnings;
use strict;
use feature qw(say);
use File::Find; #Not crazy about File::Find, but it's a standard module
use File::Basename;
my %fileHash;
my @dirs = qw(foo bar barfu fufu barbar); #List of the directories you're searching
#Finds the name of all the files. Include ALL files and not just .php, etc.
find(\&wanted, @dirs);
sub wanted {
next if (-d $File::Find::name); #Skip directories
$fileHash{$File::Find::name} = 0; #Number of times file is referenced
}
# Outer Loop: Foreach file you have to parse
foreach my $fileName (keys %fileHash) {
# We don't have to grep anything except those below.
(my $suffix = $fileName) =~ s/.*\.//;
next unless ($suffix eq ".js" or $suffix eq ".cs" or $suffix eq ".php");
#Slurp up file in an array. That way, we can use the grep command
open (FILE, $fileName) or die qq(Can't open "$fileName" for reading\n);
my @lines = <FILE>;
close FILE;
# Now, look for each and every file you've got in that directory tree
# in this particular file. This is an inner loop
foreach my $fileToFind (keys %fileHash) {
my $basename = basename($fileToFind);
# If any lines in the file contain the file name, increment the hash.
if (grep /$basename/, @lines) {
$fileHash{$fileToFind} += 1;
}
}
}
#Now just print out those files who never got incremented (i.e. never referenced)
foreach my $fileName (keys %FileHash) {
next if ($fileHash{$fileName} != 0);
say "File: $fileHash{$fileName}"
}
I'm taking a shortcut of looking just for the file's basename and not the full name. In theory, I should be looking for both its full name from the root, and its name in relationship to the file itself. However, I'm too lazy to do that right now. Most likely, you don't have to worry about that.
Upvotes: 1
Reputation: 28864
echo file,count >results.csv
for f in $(find . -name *.js -o -name *.cs -o -name *.php)
do
echo $f,$(grep -cr $(basename $f) *) >> results.csv
done
this will give you a csv file like this with the number of times each file is referenced.
file,count
file1,3
file2,1
file3,0
edited to remove file path before grepping
Upvotes: 1
Reputation: 892
This is only a draft, you need research about all commands and do your own logic...
for file in $(find -type f -name \*.extension); do
grep -Rl $file /in/path
done > /tmp/myfiles
Upvotes: 0
Reputation: 1892
phew, tricky. At least if you have to take into consideration the 'being used' bit.
In the case of .cs, you can have import statements that won't easily allow you to conclude whether a file is in use. The import might work on a package-level, unless I'm mistaken (being more of a java guy...).
And I assume it gets worse for JavaScript and php files.
Maybe you should ask, why that report is valuable in the first place?
Upvotes: 0