Reputation: 11222
I'am having trouble with git clean and exclude options for nested dirs.
I would like to clean all uncommitted files from repo excluding vendor/bundle
dir.
My test repo loks like:
debugg-dir/
.git/
file.txt
not-commited-file
not-commited-folder
another-not-commited-file
vendor/
bundle/
another-not-commited-file
Reproduce test repo with:
git init debugg-dir && cd debugg-dir
touch file.txt && g add . && git commit -m "Commit"
mkdir -p not-commited-folder && touch not-commited-folder/another-not-commited-file
mkdir -p vendor/bundle && touch vendor/bundle/another-not-commited-file && touch not-commited-file
Git clean command:
git clean -d -x -n -e vendor/bundle
After clean expected to have:
debugg-dir/
.git/
file.txt
vendor/
bundle/
another-not-commited-file
Is there any proper way to exclude nested dir from git clean command?
# EDIT:
Explanation:
There is no "clean" solution for this situation.
Git clean excludes dirs with git clean -d -x -n -e dir_name
but this doesn't work with nested dirs.
Is this bug in git or there is some good reason for that? More info why this doesnt work you can find in source. Long story short, exclude pattern only works for strings till first '/' in pattern.
My solution:
cd vendor && git clean -dxf -e bundle && cd ..
git clean -dxf -e vendor
With this i managed to only keep nested dir and it's contents.
Upvotes: 6
Views: 10063
Reputation: 39
This is the code I use to clean up my git repos while excluding the venv/
directory and sub-directory:
git clean -nXd -e \!venv -e \!venv/**
For your case, the first exclusion will be enough:
git clean -nXd -e \!vendor
The second exclusion \!venv/**
are for other rules in .gitignore that may apply to file or folder inside vendor. Ex:
.gitignore
*.log
vendor/
bundle/
another-not-commited-file.log
Upvotes: 1
Reputation: 1324557
Git 2.24 (Q4 2019) makes git clean
more robust when it comes to nested Git repositories (not just folders)
See commit 69f272b (01 Oct 2019), and commit 902b90c, commit ca8b539, commit 09487f2, commit e86bbcf, commit 3aca580, commit 29b577b, commit 89a1f4a, commit a3d89d8, commit 404ebce, commit a5e916c, commit bbbb6b0, commit 7541cc5 (17 Sep 2019) by Elijah Newren (newren
).
(Merged by Junio C Hamano -- gitster
-- in commit aafb754, 11 Oct 2019)
clean
: avoid removing untracked files in a nested Git repositoryUsers expect files in a nested git repository to be left alone unless sufficiently forced (with two
-f
's).Unfortunately, in certain circumstances, git would delete both tracked (and possibly dirty) files and untracked files within a nested repository.
To explain how this happens, let's contrast a couple cases.
First, take the following example setup (which assumes we are already within a git repo):
git init nested
cd nested
>tracked
git add tracked
git commit -m init
>untracked
cd ..
In this setup, everything works as expected; running '
git clean -fd
' will result infill_directory()
returning the following paths:
nested/
nested/tracked
nested/untracked
and then
correct_untracked_entries()
would notice this can be compressed to:
nested/
and then since "
nested/
" is a directory, we would callremove_dirs("nested/", ...)
, which would checkis_nonbare_repository_dir()
and then decide to skip it.However, if someone also creates an ignored file:
>nested/ignored
then running '
git clean -fd
' would result infill_directory()
returning the same paths:
nested/
nested/tracked
nested/untracked
but
correct_untracked_entries()
will notice that we had ignored entries under nested/ and thus simplify this list to
nested/tracked
nested/untracked
Since these are not directories, we do not
call remove_dirs()
which was the only place that had theis_nonbare_repository_dir()
safety check -- resulting in us deleting both the untracked file and the tracked (and possibly dirty) file.One possible fix for this issue would be walking the parent directories of each path and checking if they represent nonbare repositories, but that would be wasteful.
Even if we added caching of some sort, it's still a waste because we should have been able to check that "nested/" represented a nonbare repository before even descending into it in the first place.
Add aDIR_SKIP_NESTED_GIT
flag todir_struct.flags
and use it to preventfill_directory()
and friends from descending into nested git repos.With this change, we also modify two regression tests added in commit 91479b9 ("
t7300
: add tests to document behavior of clean and nested git", 2015-06-15, Git v2.6.0-rc0).
That commit, nor its series, nor the six previous iterations of that series on the mailing list discussed why those tests coded the expectation they did.
In fact, it appears their purpose was simply to test existing behavior to make sure that the performance changes didn't change the behavior.
However, these two tests directly contradicted the manpage's claims that two-f
's were required to delete files/directories under a nested git repository.
While one could argue that the user gave an explicit path which matched files/directories that were within a nested repository, there's a slippery slope that becomes very difficult for users to understand once you go down that route (e.g. what if they specified "git clean -f -d '*.c'
"?)
It would also be hard to explain what the exact behavior was; avoid such problems by making it really simple.Finally, there are still a couple bugs with
-ffd
not cleaning out enough (e.g. missing the nested.git
) and with-ffdX
possibly cleaning out the wrong files (paying attention to outer.gitignore
instead of inner).
This patch does not address these cases at all (and does not change the behavior relative to those flags), it only fixes the handling when given a single-f
.
See this thread for more discussion of the-ffd[X?]
bugs.
With Git 2.25.1 (Feb. 2020), a corner case bugs in "git clean
" that stems from a (necessarily for performance reasons) awkward calling convention in the directory enumeration API has been corrected.
See commit 0cbb605, commit ad6f215 (16 Jan 2020) by Jeff King (peff
).
See commit 2270533 (16 Jan 2020) by Elijah Newren (newren
).
See commit f365bf4 (16 Jan 2020) by Derrick Stolee (derrickstolee
).
(Merged by Junio C Hamano -- gitster
-- in commit 7ab963e, 05 Feb 2020)
dir
:treat_leading_path()
andread_directory_recursive()
, round 2Signed-off-by: Elijah Newren
I was going to title this "
dir
: more synchronizing oftreat_leading_path()
andread_directory_recursive()
", a nod to commit 777b42034764 ("dir
: synchronizetreat_leading_path()
andread_directory_recursive()
", 2019-12-19, Git v2.25.0-rc0 -- merge), but the title was too long.Anyway, first the backstory...
fill_directory()
has always had a slightly error-prone interface: it returns a subset of paths which might match the specified pathspec; it was intended to prune away some paths which didn't match the specified pathspec and keep at least all the ones that did match it.Given this interface, callers were responsible to post-process the results and check whether each actually matched the pathspec.
builtin/clean.c
did this.It would first prune out duplicates (e.g. if "
dir
" was returned as well as all files under "dir/
", then it would simplify this to just "dir
"), and after pruning duplicates it would compare the remaining paths to the specified pathspec(s).This post-processing itself could run into problems, though, as noted in commit 404ebceda01c ("
dir
: also check directories for matching pathspecs", 2019-09-17, Git v2.24.0-rc0 -- merge listed in batch #8):For the case of
git clean
and a set of pathspecs of "dir/file
" and "more
", this caused a problem because we'd end up with dir entries for both: "dir
" and "dir/file
"
Thencorrect_untracked_entries()
would try to helpfully prune duplicates for us by removing "dir/file
" since it's under "dir
", leaving us with "dir
".
Since the original pathspec only had "dir/file
", the only entry left doesn't match and leaves nothing to be removed.
(Note that if only one pathspec was specified, e.g. only "dir/file
", then thecommon_prefix_len optimizations
infill_directory
would cause us to bypass this problem, making it appear in simple tests that we could correctly remove manually specified pathspecs.)That commit fixed the issue -- when multiple pathspecs were specified -- by making sure
fill_directory()
wouldn't return both "dir
" and "dir/file
" outside thecommon_prefix_len
optimization path.This is where it starts to get fun.
In commit b9670c1f5e6b ("
dir
: fix checks on common prefix directory", 2019-12-19, Git v2.25.0-rc0 -- merge), we noticed that thecommon_prefix_len
wasn't doing appropriate checks and letting all kinds of stuff through, resulting in recursing into .git/ directories and other craziness.So it started locking down and doing checks on pathnames within that code path.
That continued with commit 777b42034764 ("
dir
: synchronizetreat_leading_path()
andread_directory_recursive()
", 2019-12-19, Git v2.25.0-rc0 -- merge), which noted the following:Our optimization to avoid calling into
read_directory_recursive()
when all pathspecs have a common leading directory mean that we need to match the logic thatread_directory_recursive()
would use if we had just called it from the root.
Since it does more than calltreat_path()
, we need to copy that same logic....and then it more forcefully addressed the issue with this wonderfully ironic statement:
Needing to duplicate logic like this means it is guaranteed someonewill eventually need to make further changes and forget to update both locations.
It is tempting to just nuke theleading_directory
special casing to avoid such bugs and simplify the code, butunpack_trees
'verify_clean_subdirectory()
also callsread_directory()
and does so with a non-empty leading path, so I'm hesitant to try to restructure further.
Add obnoxious warnings totreat_leading_path()
andread_directory_recursive()
to try to warn people of such problems.You would think that with such a strongly worded description, that its author would have actually ensured that the logic in
treat_leading_path()
andread_directory_recursive()
did actually match and that everything that was needed had at least been copied over at the time that this paragraph was written.But you'd be wrong, I messed it up by missing part of the logic.
Upvotes: -1
Reputation: 7251
As per git clean --help
git-clean - Remove untracked files from the working tree
If you add to this Floyd Pink's explanation about -d
(shortly, the option allows to remove also untracked directories and not just files), then that is why you get vendor
removed as well.
Now, supposedly you want to remove only I think you should not-commited-file
(so, neither any untracked directory nor another-not-commited-file
)git clean
interactive mode, so
git clean -i
which will ask you what to do for each untracked file (only files, add -d
if you want to asked for directory too).
EDIT after OP's question editing: you want to remove directories too, so run
git clean -i -d
EDIT 2: since the meaning of -e
was not clear to me from the manual, I googled it and found this. I suggest reading the conversation since it explains the real meaning of -e
which is not how the OP intended it (or can be understood from the manual)
EDIT 3, more about -e
switch. Following the link I found in edit 2, I decided to try it out. Here the result, which I hope will help you to understand -e
.
Content of .gitignore
, so I don't commit temporary files:
*.tmp
I issued the commands:
echo "Temporary file" > sample.tmp
git st //which of course shows *nothing to commit, working directory clean*
git clean -fX -e \!sample.tmp
The result is that ALL files with tmp extension are deleted (due to -X
) BUT sample.tmp
. So, in conclusion, what -e
really does, in my understanding and please correct me if in the I am wrong, is NOT to exclude patterns from the cleaning process but
to exclude patterns from the rule of cleaning (in my case the rule was to remove ALL .tmp files, from which I manually excluded sample.tmp).
Upvotes: 0
Reputation: 10374
This is because vendor
is an untracked directory and you are using the option - d
.
As the manual says:
-d
Remove untracked directories in addition to untracked files. If an untracked directory is managed by a different git repository, it is not removed by default. Use -f option twice if you really want to remove such a directory.
I could get the required output using this command:
git clean -x -n
Does that work in the real scenario? If it doesn't, you might want to commit some other file within vendor/bundle
and then see.
Upvotes: 2