Jack
Jack

Reputation: 361

How to extract just part of this string in C?

I have a version file I need to parse to get certain versions in C99. For example purposes, say one of the strings looks like this:

FILE: EXAMPLE ABC123459876-001 REV 1.IMG

The 12345 numbers can be any arbitrary numbers, but always followed by 4 digits and a hyphen + a rev and an extension. I just want to return the middle of this string, that is, the file name + main version so: "EXAMPLE 9876-001 REV 1". I got it to work in the regex101 tester online with something like:

"(?<=EXAMPLE ABC.....)(....-... REV .)(?=.IMG)"

... but C99 regex does not support positive lookahead / lookbehind operators so this does not work for me. Should I be using strstr() or strtok() instead? Just looking for some ideas as to the best way to be doing this in C, thanks.

Upvotes: 1

Views: 484

Answers (3)

Nathan Owen
Nathan Owen

Reputation: 165

Simplest way would probably be to use sscanf but it does risk buffer overflow (make sure your buffers are longer than the max file path length on the system and you should be fine).

Try something like this (code not tested):

int ret;
char sequence_num_prefix[ MAX_PATH_LEN + 1 ] = {0};
char sequence_num_postfix[ MAX_PATH_LEN + 1 ] = {0};
char version_num[ MAX_PATH_LEN + 1 ] = {0};
char my_name[ MAX_PATH_LEN + 1 ] = {0};

ret = sscanf( input_path_buf, "EXAMPLE ABC%[0-9]-%[0-9] REV %[0-9]", 
              sequence_num_prefix, sequence_num_postfix, version_num);

if( ret != 3 )
{
    //error
}

snprintf( my_name, sizeof( my_name ), "EXAMPLE %s-%s REV %s", 
          sequence_num_prefix, sequence_num_postfix, version_num );

Of course a safer way would be to use while loops, or, for cleanliness, use Bison.

Upvotes: 0

deiga
deiga

Reputation: 1637

Do you really need regex for this? Could you not just split this string into substrings and work with that?

  1. You can remove the extension with finding the dot with strchr
  2. Substring the file name
  3. Use regex to get the rest with ([0-9]{4}.*$)

Upvotes: 1

SourceOverflow
SourceOverflow

Reputation: 2048

So you want everything except the File:-prefix and the file ending? Since File sounds static, this regex should work:

File: ([^\.]*)\..*

You can than get that group using regexec

Upvotes: 1

Related Questions