Reputation: 7323
How can I know if a file is a binary file?
For example, a compiled C file is a binary file.
I want to read all files from some directory, but I want to ignore binary files.
Upvotes: 72
Views: 92171
Reputation: 11243
I use
! grep -qI . "$path"
Only drawback I can see is that it will consider an empty file binary but then again, who decides if that is wrong?
EDIT based on @mgutt's suggestion:
In some contexts the file could be huge so depending on what you need to do it might be safer and sufficient to read only part of the file:
head -c 1024 "$path" | grep -qI .
Keep in mind though, that you will need to choose the size wisely; 1024 bytes of text plus a null byte is still a binary file.
Upvotes: 17
Reputation: 7327
Perhaps this would suffice ..
if ! file /path/to/file | grep -iq ASCII ; then
echo "Binary"
fi
if file /path/to/file | grep -iq ASCII ; then
echo "Text file"
fi
Upvotes: 0
Reputation: 48804
Going off Bach's suggestion, I think --mime-encoding
is the best flag to get something reliable from file
.
file --mime-encoding [FILES ...] | grep -v '\bbinary$'
will print the files that file
believes have a non-binary encoding. You can pipe this output through cut -d: -f1
to trim the : encoding
if you just want the filenames.
Caveat: as @yugr reports below .doc
files report an encoding of application/mswordbinary
. This looks to me like a bug - the mime type is erroneously being concatenated with the encoding.
$ for flag in --mime --mime-type --mime-encoding; do
echo "$flag"
file "$flag" /tmp/example.{doc{,x},png,txt}
done
--mime
/tmp/example.doc: application/msword; charset=binary
/tmp/example.docx: application/vnd.openxmlformats-officedocument.wordprocessingml.document; charset=binary
/tmp/example.png: image/png; charset=binary
/tmp/example.txt: text/plain; charset=us-ascii
--mime-type
/tmp/example.doc: application/msword
/tmp/example.docx: application/vnd.openxmlformats-officedocument.wordprocessingml.document
/tmp/example.png: image/png
/tmp/example.txt: text/plain
--mime-encoding
/tmp/example.doc: application/mswordbinary
/tmp/example.docx: binary
/tmp/example.png: binary
/tmp/example.txt: us-ascii
Upvotes: 6
Reputation: 166379
grep
Assuming binary means file containing non-printable characters (excluding blank characters such as spaces, tabs or new line characters), this may work (both BSD and GNU):
$ grep '[^[:print:][:blank:]]' file && echo Binary || echo Text
Note: GNU grep
will report file containing only NULL characters as text, but it would work correctly on BSD version.
For more examples, see: How do I grep for all non-ASCII characters.
Upvotes: 2
Reputation: 166379
cat
+grep
Assuming binary means the file containing NULL characters, this shell command can help:
(cat -v file.bin | grep -q "\^@") && echo Binary || echo Text
or:
grep -q "\^@" <(cat -v file.bin) && echo Binary
This is the workaround for grep -q "\x00"
, which works for BSD grep, but not for GNU version.
Basically -v
for cat
converts all non-printing characters so they are visible in form of control characters, for example:
$ printf "\x00\x00" | hexdump -C
00000000 00 00 |..|
$ printf "\x00\x00" | cat -v
^@^@
$ printf "\x00\x00" | cat -v | hexdump -C
00000000 5e 40 5e 40 |^@^@|
where ^@
characters represent NULL character. So once these control characters are found, we assume the file is binary.
The disadvantage of above method is that it could generate false positives when characters are not representing control characters. For example:
$ printf "\x00\x00^@^@" | cat -v | hexdump -C
00000000 5e 40 5e 40 5e 40 5e 40 |^@^@^@^@|
See also: How do I grep for all non-ASCII characters.
Upvotes: 3
Reputation: 166379
grep
Here is a simple solution to check for a single file using BSD grep
(on macOS/Unix):
grep -q "\x00" file && echo Binary || echo Text
which basically checks if file consist NUL character.
Using this method, to read all non-binary files recursively using find
utility you can do:
find . -type f -exec sh -c 'grep -q "\x00" {} || cat {}' ";"
Or even simpler using just grep
:
grep -rv "\x00" .
For just current folder, use:
grep -v "\x00" *
Unfortunately the above examples won't work for GNU grep
, however there is a workaround.
grep
Since GNU grep
is ignoring NULL characters, it's possible to check for other non-ASCII characters like:
$ grep -P "[^\x00-\x7F]" file && echo Binary || echo Text
Note: It won't work for files containing only NULL characters.
Upvotes: 7
Reputation: 6939
You can do this also by leveraging the diff
command. Check this answer:
Upvotes: 0
Reputation: 1062
Try the following command-line:
file "$FILE" | grep -vq 'ASCII' && echo "$FILE is binary"
Upvotes: 1
Reputation: 1
It's kind of brute force to exclude binary files with tr -d "[[:print:]\n\t]" < file | wc -c
, but it is no heuristic guesswork either.
find . -type f -maxdepth 1 -exec /bin/sh -c '
for file in "$@"; do
if [ $(LC_ALL=C LANG=C tr -d "[[:print:]\n\t]" < "$file" | wc -c) -gt 0 ]; then
echo "${file} is no ASCII text file (UNIX)"
else
echo "${file} is ASCII text file (UNIX)"
fi
done
' _ '{}' +
The following brute-force approach using grep -a -m 1 $'[^[:print:]\t]' file
seems quite a bit faster, though.
find . -type f -maxdepth 1 -exec /bin/sh -c '
tab="$(printf "\t")"
for file in "$@"; do
if LC_ALL=C LANG=C grep -a -m 1 "[^[:print:]${tab}]" "$file" 1>/dev/null 2>&1; then
echo "${file} is no ASCII text file (UNIX)"
else
echo "${file} is ASCII text file (UNIX)"
fi
done
' _ '{}' +
Upvotes: 0
Reputation: 5868
perl -E 'exit((-B $ARGV[0])?0:1);' file-to-test
Could be used to check whenever "file-to-test" is binary. The above command will exit wit code 0 on binary files, otherwise the exit code would be 1.
The reverse check for text file can look like the following command:
perl -E 'exit((-T $ARGV[0])?0:1);' file-to-test
Likewise the above command will exit with status 0 if the "file-to-test" is text (not binary).
Read more about the -B
and -T
checks using command perldoc -f -X
.
Upvotes: 4
Reputation: 80384
Use Perl’s built-in -T
file test operator, preferably after ascertaining that it is a plain file using the -f
file test operator:
$ perl -le 'for (@ARGV) { print if -f && -T }' \
getwinsz.c a.out /etc/termcap /bin /bin/cat \
/dev/tty /usr/share/zoneinfo/UTC /etc/motd
getwinsz.c
/etc/termcap
/etc/motd
Here’s the complement of that set:
$ perl -le 'for (@ARGV) { print unless -f && -T }' \
getwinsz.c a.out /etc/termcap /bin /bin/cat \
/dev/tty /usr/share/zoneinfo/UTC /etc/motd
a.out
/bin
/bin/cat
/dev/tty
/usr/share/zoneinfo/UTC
Upvotes: 3
Reputation: 6682
Adapted from excluding binary file
find . -exec file {} \; | grep text | cut -d: -f1
Upvotes: 18
Reputation: 16039
Use utility file
, sample usage:
$ file /bin/bash
/bin/bash: Mach-O universal binary with 2 architectures
/bin/bash (for architecture x86_64): Mach-O 64-bit executable x86_64
/bin/bash (for architecture i386): Mach-O executable i386
$ file /etc/passwd
/etc/passwd: ASCII English text
$ file code.c
code.c: ASCII c program text
Upvotes: 80