Reputation: 11
I have a parent directory with over 800+ directories, each of these has a unique name. Some of these directories house a sub-directory called y
in which a file called z
, (if it exists) can be found.
I need to script a loop that will check each of the 800+ for z
, and if it's there, I need to append the name of the directory (the directory before y
) into a text file. I'm not sure how to do this.
This is what I have
#!/bin/bash
for d in *; do
if [ -d "y"]; then
for f in *; do
if [ -f "x"]
echo $d >> IDlist.txt
fi
fi
done
Upvotes: 1
Views: 537
Reputation: 17208
The first example doesn't check that z
is a file, but I think it's worth showing compgen
:
#!/bin/bash
compgen -G '*/y/z' | sed 's|/.*||' > IDlist.txt
Doing glob expansion, file check and path splitting with perl
only:
perl -E 'foreach $p (glob "*/y/z") {say substr($p, 0, index($p, "/")) if -f $p}' > IDlist.txt
Upvotes: 1
Reputation: 13717
You can first try to do some filtering using find
Below will list all z files recursively within current directory
Then let's say the one of the output was
./dir001/y/z
Then you can extract required part using multiple ways grep
, sed
, awk
, etc
e.g. with grep
find . -type f | grep z | grep -E -o "y.*$"
will give
y/z
Upvotes: 1
Reputation: 22291
This should do it:
shopt -s nullglob
outfile=IDlist.txt
>$outfile
for found in */y/x
do
[[ -f $found ]] && echo "${found%%/*}" >>$outfile # Drop the /y/x part
done
The nullglob ensures that the loop is skipped if there is no match, and the quotes in the echo
ensure that the directory name is output correctly even if it contains two successive spaces.
Upvotes: 1
Reputation: 29280
Let's assume that any foo/y/z
is a file (that is, you do not have directories with such names). If you had a really large number of such files, storing all paths in a bash variable could lead to memory issues, and would advocate for another solution, but about 800 paths is not large. So, something like this should be OK:
declare -a names=(*/y/z)
printf '%s\n' "${names[@]%%/*}" > IDlist.txt
Explanation: the paths of all z
files are first stored in array names
, thanks to a glob pattern: */y/z
. Then, a pattern substitution is applied to each array element to suppress the /y/z
part: "${names[@]%%/*}"
. The result is printed, one name per line: printf '%s\n'
.
If you also had directories named z
, or if you had millions of files, find
could be used, instead, with a bit of awk
to retain only the leading directory name:
find . -mindepth 3 -maxdepth 3 -path './*/y/z' -type f |
awk -F/ '{print $2}' > IDlist.txt
If you prefer sed
for the post-processing:
find . -mindepth 3 -maxdepth 3 -path './*/y/z' -type f |
sed 's|^\./\(.*\)/y/z|\1|' > IDlist.txt
These two are probably also more efficient (faster).
Note: your initial attempt could also work, even if using bash loops is far less efficient, but it needs several changes:
#!/bin/bash
for d in *; do
if [ -d "$d/y" ]; then
for f in "$d"/y/*; do
if [ "$f" = "$d/y/z" ]; then
printf '%s\n' "$d" >> IDlist.txt
fi
done
fi
done
As noted by @LéaGris, printf
is better than echo
because if d
is the -e
string, for instance, echo "$d"
interprets it as an option of the echo
command and does not print it.
But a simpler and more efficient version (even if not as efficient as the first proposal or the find
-based ones) would be:
#!/bin/bash
for d in *; do
if [ -f "$d/y/z" ]; then
printf '%s\n' "$d"
fi
done > IDlist.txt
As you can see there is another improvement (also suggested by @LéaGris), which consists in redirecting the output of the entire loop to the IDlist.txt
file. This will open and close the file only once, instead of once per iteration.
Upvotes: 2
Reputation: 1165
This should solve it:
for f in */y/z; do
[ -f "$f" ] && echo ${f%%/*}
done
Note:
If there is a possibility of weird top level directory name like "-e", use printf
instead of echo
, as in the comment below.
Upvotes: 1