Brett
Brett

Reputation: 12007

Bash - How can I archive and compress files in subdirectories but only with a certain filename

I have a directory structure that looks like:

main_directory/
    directory1:
        sub_directory1:
            files:
                myfile.txt
                otherfile.txt
        sub_directory2:
            files:
                myfile.txt
                otherfile.txt
        sub_directory3:
            files:
                myfile.txt
                otherfile.txt
        sub_directory4:
            files:
                myfile.txt
                otherfile.txt
    directory2:
        sub_directory1:
            files:
                myfile.txt
                otherfile.txt
        sub_directory2:
            files:
                myfile.txt
                otherfile.txt
        sub_directory3:
            files:
                myfile.txt
                otherfile.txt
        sub_directory4:
            files:
                myfile.txt
                otherfile.txt

I am trying to figure out (by trial and error because I'm not an expert at Linux) how to only gzip the myfile.txt files in all the directories. Since they all have the same filename in different paths (there was no way around this), I need to be able to keep the files path in the archive as well. So the final gzipped tar file I am looking to create would have the contents:

mytar.tar.gz
    main_directory/directory1/sub_directory1/files/myfile.txt
    main_directory/directory1/sub_directory2/files/myfile.txt
    main_directory/directory1/sub_directory3/files/myfile.txt
    main_directory/directory1/sub_directory4/files/myfile.txt
    main_directory/directory2/sub_directory1/files/myfile.txt
    main_directory/directory3/sub_directory2/files/myfile.txt
    main_directory/directory4/sub_directory3/files/myfile.txt
    main_directory/directory5/sub_directory4/files/myfile.txt

Is there a simple bash way to do this? I suppose I could write a python script to do it, but that seems overkill.

Does anyone have any advice?

Upvotes: 4

Views: 302

Answers (4)

tripleee
tripleee

Reputation: 189397

If the directory structure is indeed this regular, the wildcard

main_directory/*/*/files/myfile.txt

will match the files you want. However, if there are many files, you may need to revert to find / xargs in order to avoid the "argument list too long" (ARG_MAX) problem.

If there are files named myfile.txt which you do not want to include because their path does not match the wildcard exactly, there are certainly ways to exclude them from find, too; perhaps then this additional constraint should be stated in the question.

Upvotes: 0

Emil Sit
Emil Sit

Reputation: 23542

Assuming there are not too many files, you can do something like:

cd main_directory/..
find main_directory -name "myfile.txt" | xargs tar zcf mytar.tar.gz

In the event that there are a lot of files, you can pipe the file list into a file/stream and pass that into tar.

find main_directory -name "myfile.txt" -print0 | tar zcf myar.tar.gz --null -T -

This prints out the filenames separated by nulls (-print0 to find) and instructs tar to parse that correctly from stdin; using nulls ensures that any special characters in directories are handled properly

Upvotes: 2

Brett
Brett

Reputation: 12007

This overcame this issue described in the other answer.

find main_directory/ -name "myfile.txt" | tar -czvf mytar.tar.gz -T -

Upvotes: 4

Etan Reisner
Etan Reisner

Reputation: 80931

With a new enough (4.0.0+ I believe) version of bash (and a number of other shells) the following will work:

tar -czf mytar.tar.gz main_directory/**/myfile.txt

Upvotes: 0

Related Questions