Reputation: 12036
I have some directories with the following structure:
DAY1/ # Files under this directory should have DAY1 in the name.
|-- Date
| |-- dir1 # Something wrong here, there are files with DAY2 and files with DAY1.
| |-- dir2
| |-- dir3
| |-- dir4
DAY2/ # Files under this directory should all have DAY2 in the name.
|-- Date
| |-- dir1
| |-- dir2 # Something wrong here, there are files with DAY2, and files with DAY1.
| |-- dir3
| |-- dir4
In each dir
there are hundreds of thousands of files with names containing DAY
, for example 0.0000.DAY1.01927492
. Files with DAY1
on the name should only appear under parent directory DAY1
.
Something went wrong when copying files around, so that I now have mixed files with DAY1
and DAY2
in some of the dir
directories.
I wrote a script to find folders that contain mixed files, so I can then look at them more closely. My script is the following:
for directory in */; do
if ls $directory | grep -q DAY2 ; then
if ls $directory | grep -q DAY1; then
echo "mixed files in $directory";
fi ;
fi;
done
The problem here is that I'm going through all files twice, which doesn't make sense considering that I'd only have to look through the files once.
What would be a more efficient way achieve what I want?
Upvotes: 1
Views: 61
Reputation: 183554
Given that the difference between going through them once and going through them twice is just a factor-of-two difference, changing to an approach that goes through them only once might actually not be a win, since the new approach might easily take twice as long per file.
So you'll definitely want to experiment; it's not necessarily something that you can confidently reason about.
However, I will say that in addition to going through the files twice, the ls
version also sorts the files, which probably has a more-than-linear cost (unless it's doing some kind of bucket-sort). Eliminating that, by writing ls --sort=none
instead of just ls
, will actually improve your algorithmic complexity, and is almost certain to give a tangible improvement.
But FWIW, here's a version that only goes through the files once, that you can try:
for directory in */; do
find "$directory" -maxdepth 1 \( -name '*DAY1*' -or -name '*DAY2*' \) -print0 \
| { saw_day1=
saw_day2=
while IFS= read -d '' subdirectory ; do
if [[ "$subdirectory" == *DAY1* ]] ; then
saw_day1=1
fi
if [[ "$subdirectory" == *DAY2* ]] ; then
saw_day2=1
fi
if [[ "$saw_day1" ]] && [[ "$saw_day2" ]] ; then
echo "mixed files in $directory"
break
fi
done
}
done
Upvotes: 1
Reputation: 42117
If i understand you correctly, then you need to find the files under DAY1
directory recursively that have DAY2
in their names, similarly for DAY2
directory the files what have DAY1
in their names.
If so, for DAY1
directory:
find DAY1/ -type f -name '*DAY2*'
this will get you the files under DAY1
directory that have DAY2
in their names. Similarly for DAY2
directory:
find DAY2/ -type f -name '*DAY1*'
Both are recursive operations.
To get the directory names only:
find DAY1/ -type f -name '*DAY2*' -exec dirname {} +
Note that the $PWD
will be shown as .
.
To get uniqueness, pass the output to sort -u
:
find DAY1/ -type f -name '*DAY2*' -exec dirname {} + | sort -u
Upvotes: 2