Given two directory trees how to find which filenames are the same, considering only filenames satisfying a condition?

Question

This answer tells me how to find the files with the same filename in two directories in bash:

diff -srq dir1/ dir2/ | grep identical

Now I want to consider files which satisfy a condition. If I use ls E*, I get back files starting with E. I want to do the same with the above command: give me the filenames which are different in dir1/ and dir2/, but consider only those starting with E.

I tried the following:

diff -srq dir1/E* dir2/E* | grep identical

but it did not work, I got this output:

diff: extra operand '/home/pal/konkoly/c6/elesbe3/1/EPIC_212291374- c06-k2sc.dat.flag.spline' diff: Try 'diff --help' for more information.

((/home/pal/konkoly/c6/elesbe3/1/EPIC_212291374- c06-k2sc.dat.flag.spline is a file in the so-called dir1, but EPIC_212291374- c06-k2sc.dat.flag.spline is not in the so-called dir2))

How can I solve this?

I tried doing it in the following way, based on this answer:

DIR1=$(ls dir1)
DIR2=$(ls dir2)

for i in $DIR1; do
    for j in $DIR2; do
        if [[ $i == $j ]]; then
            echo "$i == $j"
        fi
    done
done

It works as above, but if I write DIR1=$(ls path1/E*) and DIR2=$(ls path2/E*), it does not, I get no output.

melpomene · Accepted Answer

This is untested, but I'd try something like:

comm -12 <(cd dir1 && ls E*) <(cd dir2 && ls E*)

Basic idea:

Generate a list of filenames in dir1 that satisfy our condition. This can be done with ls E* because we're only dealing with a flat list of files. For subdirectories and recursion we'd use find instead (e.g. find . -name 'E*' -type f).
Put the filenames in a canonical order (e.g. by sorting them). We don't have to do anything here because E* expands in sorted order anyway. With find we might have to pipe the output into sort first.
Do the same thing to dir2.
Only output lines that are common to both lists, which can be done with comm -12.

comm expects to be passed two filenames on the command line, so we use the <( ... ) bash feature to spawn a subprocess and connect its output to a named pipe; the name of the pipe can then be given to comm.

Given two directory trees how to find which filenames are the same, considering only filenames satisfying a condition?

Answers (2)

Related Questions