Cinder Biscuits
Cinder Biscuits

Reputation: 5261

Splitting a string to remove everything after a delimiter

I am trying to parse the phone number out of a sip uri, or if the string is just a number, returning that. Basically, I want to chop off the @ and anything after it if it exists.
I wrote a small function using strtok() but the function always returns NULL.
Can anyone tell me what I'm doing wrong here?

char* GetPhoneNumber(const char* sCallee) {
    char* buf = (char*)malloc(strlen(sCallee) + 1);
    strcpy(buf, sCallee);
    char *p = strtok (buf, "@");
    char *q = strtok (p, ":");
    if (buf) {
        free(buf);
    }
    return q;
}

int main() {
    const char* raw_uri = "[email protected]";
    char* number = GetPhoneNumber(raw_uri);
    if (number == NULL) {
        printf("I am screwed! %s comes out null!", raw_uri);
    }
    char* second = GetPhoneNumber("2109999999");
    if (second == NULL) {
        printf("This does not work either.");
    }
}

Upvotes: 0

Views: 434

Answers (2)

David C. Rankin
David C. Rankin

Reputation: 84559

Your code works fine, you just cannot free(buf) before the return or you release the memory holding number and second leading to Undefined Behavior. You further need to validate each step. Suggest something like:

(Note: updateded to protect against a leading '@' in sCallee or a leading ':' in p resulting in NULL being returned)

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char *GetPhoneNumber(const char* sCallee) 
{
    if (*sCallee == '@') {  /* protect against leading '@' */
        fprintf (stderr, "invalid sCallee - leading '@'\n");
        return NULL;
    }

    char* buf = malloc (strlen(sCallee) + 1);
    if (!buf) {         /* if you allocate/validate */
        perror ("malloc - buf");
        return NULL;
    }
    strcpy(buf, sCallee);

    char *p = strtok (buf, "@");    /* get first token with '@' */
    if (!p) {   /* validate */
        fprintf (stderr, "error: strtok with '@' failed.\n");
        return NULL;
    }
    if (*p == ':') {        /* protect against leading ':' */
        fprintf (stderr, "invalid p - leading ':'\n");
        free (buf);
        return NULL;
    }
    char *q = strtok (p, ":");      /* get first token with ':' */

//     free(buf);

    return q;
}

int main () {

    const char* raw_uri = "2109999999:[email protected]";
    char* number = GetPhoneNumber(raw_uri);

    if (number == NULL) {
        printf("I am screwed! %s comes out null!\n", raw_uri);
    }
    else {
        printf ("number: %s\n", number);
        free (number);
    }

    char* second = GetPhoneNumber("2109999999");

    if (second == NULL) {
        printf("This does not work either.\n");
    }
    else {
        printf ("second: %s\n", second);
        free (second);
    }
}

(note: There is no need to cast the return of malloc, it is unnecessary. See: Do I cast the result of malloc?)

Example Use/Output

$ ./bin/strtokpnum
number: 2109999999
second: 2109999999

Memory Use/Error Check

In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.

For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.

$ valgrind ./bin/strtokpnum
==24739== Memcheck, a memory error detector
==24739== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==24739== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==24739== Command: ./bin/strtokpnum
==24739==
number: 2109999999
second: 2109999999
==24739==
==24739== HEAP SUMMARY:
==24739==     in use at exit: 0 bytes in 0 blocks
==24739==   total heap usage: 2 allocs, 2 frees, 35 bytes allocated
==24739==
==24739== All heap blocks were freed -- no leaks are possible
==24739==
==24739== For counts of detected and suppressed errors, rerun with: -v
==24739== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Always confirm that you have freed all memory you have allocated and that there are no memory errors.


Protect Against All Corner Cases

Though not part of the question, there are possible corner cases such as multiple leading '@' and multiple leading ':' in the secondary string. If you are going to protect against the possibility of having multiple leading '@' delimiters or multiple leading ':' in the secondary string, then simply use the multiple allocation method. Adding any additional checks and you might as well just walk a pointer down sCallee.

char *gettok (const char *s, const char *d1, const char *d2)
{
    char *buf = malloc (strlen (s) + 1),
        *p = NULL,
        *q = NULL,
        *s2 = NULL;

    if (!buf) { /* validate allocation */
        perror ("malloc - buf");
        return NULL;
    }
    strcpy (buf, s);    /* copy s to buf */
    if (!(p = strtok (buf, d1))) {  /* if token on d1 fails */
        free (buf);                 /* free buf */
        return NULL;
    }
    if (!(q = strtok (p, d2))) {    /* if token on d2 fails */
        free (buf);                 /* free buf */
        return NULL;
    }

    /* allocate/validate return */
    if (!(s2 = malloc (strlen (q) + 1))) {
        perror ("malloc - s2");
        return NULL;
    }
    strcpy (s2, q); /* copy token */

    free (buf);     /* free buf */

    return s2;      /* return token */
}

In that case you would simply call

gettok ("@@@@@:::::2109999999:[email protected]", "@", ":');

and you are protected.

(and if you are getting strings like that from your sip uri, fix that process)

Upvotes: 3

Pablo
Pablo

Reputation: 13580

edit

This

If it's returning NULL because the while loops ends when q is NULL. So I assume that the q = strtok (NULL, ":"); also returns NULL and that's why it leaves the loop.

makes no sense anymore, since you've edited your question and removed the code in question.

end edit

Regardless, you are using it wrong, though.

strtok returns a pointer to the original string plus an offset that marks the beginning of the next token. The pointer is at an offset of buf. So when you do free(buf), you are making the pointers returned by strtok also invalid.

You should also check first if malloc returns NULL and then try to parse it. Checking of malloc returning NULL after the parsing is wrong. Also you would need to make a copy of the value you are returning.

char* GetPhoneNumber(const char* sCallee) {
    char* buf = malloc(strlen(sCallee) + 1);
    if(buf == NULL)
        return NULL;

    strcpy(buf, sCallee);

    char *p = strtok (buf, "@");
    if(p == NULL)
    {
        // no token found
        free(buf);
        return NULL;
    }

    char *q = strtok (p, ":"); // makes no sense after your edit
                               // I don't see any colons in your input
                               // but I leave this to show you how
                               // it would be done if colons were present

    if(q == NULL)
    {
        // no token found
        free(buf);
        return NULL;
    }

    char *copy = malloc(strlen(q) + 1);
    if(copy == NULL)
    {
        free(buf);
        return NULL;
    }

    strcpy(copy, q);
    free(buf);

    return copy;
}

Also when you call this function, you have to remember to free the pointer returned by GetPhoneNumber.

edit2

To be honest, I don't see why you even use strtok if the number comes before @. You can use strchr instead:

char* GetPhoneNumber(const char* sCallee) {

    if(sCallee == NULL)
        return NULL;

    char *p = strchr(sCallee, '@');

    if(p == NULL)
        return NULL; // wrong format

    char *q = calloc(1, p - sCallee + 1);

    if(q == NULL)
        return NULL;

    strncpy(q, sCallee, p - sCallee);

    // q already \0-terminated because of calloc

    return q;
}

Upvotes: 3

Related Questions