Reputation: 169
I need to write C code that checks to see if a file is text(ASCII) or Binary
Could someone help? Thanks
Upvotes: 6
Views: 7573
Reputation: 137
You can use libmagic. The code below will show you roughly the way the "file" command does it. (The code below is quick and dirty -- it probably needs to be cleaned up.)
#include <string.h>
#include <magic.h>
#include <stdio.h>
//------------------------------------------------------------------------------
struct magic_set * prep_magic(int flags)
{
struct magic_set *magic = magic_open(flags);
const char *errstring;
int action = 0;
const char *magicfile = NULL;
if (magicfile == NULL)
magicfile = magic_getpath(magicfile, action);
if (magic == NULL)
{
printf("Can't create magic");
return NULL;
}
if (magic_load(magic, magicfile) == -1)
{
printf("%s", magic_error(magic));
magic_close(magic);
return NULL;
}
if ((errstring = magic_error(magic)) != NULL)
printf("%s\n", errstring);
return magic;
/* END FUNCTION prep_magic */ }
//------------------------------------------------------------------------------
int main(int argc, char **argv)
{
int flags = 0;
struct magic_set *msetptr = NULL;
const char *testfile = (char *)"/etc/motd";
msetptr = prep_magic(flags);
if( msetptr == NULL )
printf("no mset ptr\n");
const char *typer;
typer = magic_file( msetptr, testfile );
printf("typer = %s\n", typer );
return 0;
/* END PROGRAM */ }
Upvotes: 1
Reputation: 213318
Typical method is to read the first several hundred bytes and look for ASCII NUL.
If the file contains NUL, it is definitely a binary file. Most binary files do contain NUL bytes, but text files should never contain NUL bytes.
#include <string.h>
bool is_binary(const void *data, size_t len)
{
return memchr(data, '\0', len) != NULL;
}
Be warned that this is a heuristic. In other words, it will be wrong sometimes.
Upvotes: 6
Reputation: 62048
Read all characters and see if all of them are ASCII, that is, with codes from 0 to 127 inclusive.
Some tools determine whether a file is a text file or a binary file by just checking whether or not it has any byte with code 0.
Clearly, if you apply both of these methods, you will get different results for some files, so, you have to define what it is exactly that you're looking for.
Upvotes: 3