CodeKingPlusPlus
CodeKingPlusPlus

Reputation: 16081

C programming string parsing

I am parsing a macro definition from a Makefile into two strings, the name of the macro and the body. For example here is a macro definition line from my Makefile:

macro-1 = body-1

My code produces a bus error/segmentation fault.

static void parse_macro_def(const char* line)
{
   char* m_name;
   int name_pos = 0;

   int i = 0;
   while(line[i++] != '=')                    //iterate until an equal sign is found
   {
      if(!isspace(line[i]))          //copy characters to m_name  unless the character is whitespace
      {
         m_name[name_pos++] = line[i];
      }
   }
}
m_name[name_pos] = '\0';

m_name should be set to macro-1 Thanks for all the help!

Upvotes: 0

Views: 1097

Answers (3)

Jonathan Leffler
Jonathan Leffler

Reputation: 754110

You don't initialize m_name so it points anywhere, so you are writing at random, and crashing.

You need to allocate and return the space, or pass the space in (along with the length of the space), ensuring you don't overwrite in every case.


You should note that white space characters in a macro name are a bug in the macro definition. There can be leading white space; there can be trailing white space; but there can't be white space in the middle of a name. Now, if you are assuming that you have been given a valid, working Makefile to parse, you may be able to get away with ignoring this subtlety. If you're writing a replacement for make, you can't.

Unless you've previously validated that there is an equals sign in the string, you should also check that you don't run off the end of the string (you don't scan past a NUL '\0'). In fact, in robust code, you'd probably ensure that out of paranoia.

while (line[i] != '\0' && line[i] != '=')
{
    ...
}

And, while writing that, I realized that you increment i in the while condition, and then check whether the next character is a space in the body of the loop. That's a little aconventional, shall we say. If you come across a macro:

MACRO=value

you will copy the = into m_name when the loop condition checks the O. And, AFAICS, you won't copy the M.


Note that your line:

m_name[name_pos] = '\0';

is outside any function and therefore a syntax error.

Upvotes: 1

Mario
Mario

Reputation: 36497

There are some flaws in your code, e.g. you'll never check/copy the first character in a line (because you increase i immediately). Also m_name points nowhere (it's undefined).

In general, I'd use a bit different approach. I'm not sure where you'd like to expect spaces, but right now (assuming your code would work) you're concatenating everything, e.g. "some value = something" (I know this wouldn't be valid code) would result in the variable name "somevalue" due to you skipping all space characters.

I'd use something like this (it's late and I'm sleepy, so might include a few bugs, but should give you some idea on what you could try to do):

char name[256];
const char *start = line; // points to beginning of the line
const char *end = strchr(line, '='); // returns a pointer to the position where there's an equal sign (if there's any; 0 otherwise)
if (end) { // only try to parse if there's an equal sign
    for(; start < end && isspace(*start); ++start); // this will effectively remove all leading space characters
    for(; end > start && isspace(*(end - 1)); --end); // this will effectively remove all trailing space characters
    strncpy(name, start, end - start); // copy the name
    // do something else here
}

Also, depending on what you're going to do in other parts, it might be useful to use some kind of regular expression library (if the added overhead is worth it, depends on your project) and use some expression like \s*(.*?)\s*=.

Upvotes: 0

dreamlax
dreamlax

Reputation: 95335

m_name is indeterminate, you have not assigned any value to it. You probably want to assign it the return value of a call to malloc or calloc.

Also, your loop will read past the end of the line if the line does not contain an =. You need to make sure that your loop terminates once it has reached either = or the end of the line (which may be '\n' or '\0' depending on your needs).

Upvotes: 0

Related Questions