Reputation: 179687
I have a repository with a lot of autogenerated source files I've marked as "binary" in .gitattributes
(they are checked in because not everyone has access to the generator tools). Additionally, the repo has a lot of source-ish files in ignored directories (again, generated as part of the build processes), and a number of actual binary files (e.g. little resource files like icons).
I'd now like to find all the non-auto-generated and non-ignored files in the repo. I thought I'd just do this with find
and a bunch of exclusions, but now I have a horrendous find
statement with a dozen clauses (and it still doesn't perfectly do the job). git ls-files
works but shows me all the binary files without differentiation, which I have to filter out.
So, I'm wondering: is there a simple command I can run which lists every file checked into the the repo, and which git
considers a "text" file?
Upvotes: 31
Views: 5330
Reputation: 4272
You use gits eol
attributes to find non-binary files.
git ls-files --eol | grep '^i/lf' | cut -f 2-
This list all files that are checked in having 'LF' line-endings.
This has the advantage of using the git ls-files
command, so it can easily be piped to xargs
. It's also a plumbing command, so it might be faster (I haven't benchmarked).
This may be a viable alternative to using the git grep
method as it appears to be more customizable in terms of what one considers binary and not.
Note that you can specify which files git should consider binary in .gitattributes
. So if you add *.svg binary
to .gitattributes
. The git grep
method respects this. The eol
attribute will also respect, but not for old files already checked into the index prior to setting the attribute. But you can always add a | grep -v 'attr/-text'
to exclude files that have been set as binary in the .gitattributes
.
Upvotes: 3
Reputation: 6239
Using git ls-files
and awk
:
git ls-files --eol | awk -F '\t' '{if ($0 !~ /^i\/-text/) print $2}'
Note: this solution also works and returns non-binary, empty files.
Explanation:
--eol
: show <eolinfo>
and <eolattr>
of files. Ref.: https://git-scm.com/docs/git-ls-files#Documentation/git-ls-files.txt---eolawk -F '\t'
: parse and separate the piped input lines by tabs. At least with git version 2.37.2, the output format of git ls-files --eol
displays 4 "humanly-readable" columns, however, only the last 4th is preceded by tab. Accordingly, if we separate by tabs, awk
considers two columns.if ($0 !~ /^i\/-text/)
: only match if the line does not start with i/-text/
. This is our test to know that the file is NOT a binary file.print $2
: print the 2nd column, which is the file's path (as requested by the OP). Note that this solution also works for filenames containing spaces.Acknowledgment: My answer expands on @CervEd answer (https://stackoverflow.com/a/67346778/341320) and also takes as a reference another post answer from @Quential33 (https://stackoverflow.com/a/66796286/341320)
Upvotes: 0
Reputation: 384124
git grep --cached -Il ''
lists all non-empty regular (no symlinks) text files:
-I
: don't match the pattern in binary files-l
: only show the matching file names, not matching lines''
: the empty string makes git grep
match any non-empty file--cached
: also find files added with git add
but not yet committed (optional)Or you could use How to determine if Git handles a file as binary or as text? in a for loop with git ls-files
.
TODO empty files.
Find all binary files instead: Find all binary files in git HEAD
Tested on Git 2.16.1 with this test repo.
Upvotes: 39
Reputation: 2546
A clever hack to achieve this: listing all non-binary files that contains carriage returns
$ git grep --cached -I -l -e $'\r'
For my case, an empty string works better:
$ git grep --cached -I -l -e $''
Took it from git list binary and/or non-binary files?.
Upvotes: 4
Reputation: 1328282
The standard method for listing non-ignored files is:
git ls-files --exclude-standard --cached
But, as you seen, it lists all versioned files.
One workaround could be to define in a separate file "exclude_binaries
" an exclusion pattern in order to match all binaries that you know of.
git ls-files --exclude-standard --cached \
--exclude-from=/path/to/`exclude_binaries`
That would be a less complex find
, but it doesn't provide a fully automated way to list non-binary files: you still have to identify and list them in a separate pattern file.
Upvotes: 1