Math_Seeker
Math_Seeker

Reputation: 45

Inefficiently using strstr and strchr

While reviewing my code, my professor said that my use of strstr and strchr results in a lot of wasted resources as every and each one of them scans the string.

Can I reduce the amount of functions in a good way?

This code scans a string and based on set parameters decides whether the input is valid or not.
ch1 is '@' and ch2 is '.', (email[i]) is the string.

    for (i = 0; email[i] != 0; i++) {
        {
            if (strstr(email, "@.") ||
                strstr(email, ".@") ||
                strstr(email, "..") ||
                strstr(email, "@@") ||
                email[i] == ch1 ||
                email[i] == ch2 ||
                email[strlen(email) - 1] == ch1 ||
                email[strlen(email) - 1] == ch2) {
                printf("The entered e-mail '%s' does not pass the required parameters, Thus it is invalid\n", email);
            } else {
                printf("The email '%s' is a valid e-mail address\n",email);
            }
            break;
        }
    }

This is the snippet I'm talking about.

Should I write my own code that does the checking once? if so, can you give me some pointers in that regards? thank you.

EDIT: Thank you very much for your responses, I did learn of the mistakes in my code and hopefully I learn from them.
Thanks again!

EDIT:2: I want to thank you again for your responses, they have helped me immensely, and I believe that I have written better code

int at_count = 0, dot_count = 0, error1 = 0, error2 = 0;
int i;
size_t length = strlen(email);
int ch1 = '@', ch2 = '.';

for ( i = 0; email[i] != '\0'; i++)  /* for loop to count the occurance of the character '@' */
    {
    if ( email[i] == ch1)
        at_count++;
    }

for ( i = 0; email[i] != '\0'; i++)  /* for loop to count the occurance of the character '.' */
    {
    if ( email[i] == ch2)
        dot_count++;
    }

if ( email[0] == ch1 || email[0] == ch2 || email[length-1] == ch1 || email[length-1] == ch2 )
        {
    error1++;
        }
else
        {
    error1 = 0;
        }


if ( strstr(email,".@") || strstr(email, "@.") || strstr(email, "..") || strstr(email, "@@"))
        {
    error2++;
        }
else
        {
    error2 = 0;
        }

if ( (at_count != 1) || (dot_count < 1) || (error1 == 1) || (error2 == 1))
    {
    printf("The user entered email address '%s' is invalid\n", email);
    }
else
    {
    printf("'%s' is a valid email address\n", email);
    }

I feel this is more elegant and simpler code, also more efficient.
My main inspiration was @chqrlie, as I felt his code was very nice and easy to read. Is there anyway I can improve?
(The email checks are only for practice, don't mind them!) Thank you very much everyone!

Upvotes: 3

Views: 766

Answers (3)

David C. Rankin
David C. Rankin

Reputation: 84662

Your professor has a good point about the inefficiency in repetitively scanning characters in email. Optimally, each character should be scanned only once. Whether you use a for loop and string indexing (e.g. email[i]) or simply walk-a-pointer down the email string is up to you, but you should be locating each character only once. Instead, in your current code you are doing

for every character in email, you

  • scan email 4-times with strstr to locate a given substring, and
  • scan to the end of email 2-times with strlen

Think about it. For every character in email, you are calling strlen twice which scans forward over the entire contents of email looking for the nul-terminating character. All four of your strstr calls are locating two character in differing combinations. You could at minimum scan for one or the other and then check the prior character and the one that follows.

@chqrlie points out additional character combinations and conditions that should be checked for, but since I presume this is a learning exercise rather than something intended for production code, it is enough to be aware that additional criteria are needed to make an e-mail validation routine.

While there is nothing wrong with including string.h and for longer strings (generally larger than 32-chars), the optimizations in the string.h function will provide varying degrees of improved efficiency, but there is no need to incur any function call overhead. Regardless what you are looking for in your input, you can always walk down your string with a pointer checking each character and taking the appropriate actions as needed.

A short additional example of that approach to your problem, using the lowly goto in lieu of a error flag, could look something like the following:

#include <stdio.h>

#define MAXC 1024

int main (void) {

    char buf[MAXC] = "",    /* buffer to hold email */
        *p = buf;           /* pointer to buf  */
    short at = 0;           /* counter for '@' */

    fputs ("enter e-mail address: ", stdout);
    if (fgets (buf, MAXC, stdin) == NULL) {     /* read/validate e-mail */
        fputs ("(user canceled input)\n", stderr);
        return 1;
    }

    while (*p && *p != '\n') {  /* check each character in e-mail */
        if (*p == '@')          /* count '@' - exactly 1 or fail */
            at++;
        if (p == buf && (*p == '@' || *p == '.'))   /* 1st char '@ or .' */
            goto emailerr;
        /* '@' followed or preceded by '.' */
        if (*p == '@' && (*(p+1) == '.' || (p > buf && *(p-1) == '.')))
            goto emailerr;
        /* sequential '.' */
        if (*p == '.' && (*(p+1) == '.' || (p > buf && *(p-1) == '.')))
            goto emailerr;
        p++;
    }   /* last char '@' or '.' */
    if (*(p-1) == '@' || *(p-1) == '.' || at != 1)
        goto emailerr;

    if (*p == '\n')     /* trim trailing '\n' (valid case) */
        *p = 0;

    printf ("The email '%s' is a valid e-mail address\n", buf);
    return 0;

  emailerr:;
    while (*p && *p != '\n')    /* locate/trim '\n' (invalid case) */
        p++;
    if (*p == '\n')
        *p = 0;
    printf ("The email '%s' is an invalid e-mail address\n", buf);
    return 1;
}

As mentioned there are many ways to go about the e-mail validation, and to a large degree you should not focus on "micro optimizations", but instead focus on writing logical code with sound validation. However, as your professor as pointed out, at that same time your logic should not be needlessly repetitive injecting inefficiencies into the code. Writing efficient code takes continual practice. A good way to get that practice is to write sever different versions of your code and then either dump your code to assembly and compare or time/profile your code in operation to get a sense of where inefficiencies may be. Have fun with it.

Look things over and let me know if you have further questions.

Upvotes: 3

chqrlie
chqrlie

Reputation: 145317

Your code indeed has multiple problems:

for (i = 0; email[i] != 0; i++) {   // you iterate for each character in the string.
    {   //this is a redundant block, remove the extra curly braces
        if (strstr(email, "@.") ||  // this test only needs to be performed once
            strstr(email, ".@") ||  // so does this one
            strstr(email, "..") ||  // so does this one
            strstr(email, "@@") ||  // again...
            email[i] == ch1 ||      // this test is only performed once
            email[i] == ch2 ||      // so is this one
            email[strlen(email) - 1] == ch1 ||  // this test is global
            email[strlen(email) - 1] == ch2) {  // so is this one
            printf("The entered e-mail '%s' does not pass the required parameters, Thus it is invalid\n", email);
        } else {
            printf("The email '%s' is a valid e-mail address\n", email);
        }
        break;  // you always break from the loop, why have a loop at all?
    }
}

You do scan the string 4 times to test the various patterns and another 2 times for strlen(). It should be possible to perform the same tests in the course of a single scan.

Note also that more problems go unnoticed:

  • there should be a single @ present
  • there should not be any spaces
  • more generally, the characters allowed in the address are limited.

Some of the tests seem overkill: why refuse .. before the @, why refuse a trailing . before the @?

Here is a more efficient version:

int at_count = 0;
int has_error = 0;
size_t i, len = strlen(email);

if (len == 0 || email[0] == ch1 || email[0] == ch2 ||
    email[len - 1] == ch1 || email[len - 1] == ch2) {
    has_error = 1;
}

for (i = 0; !has_error && i < len; i++) {
    if (email[i] == '.') {
        if (email[i + 1] == '.' || email[i + 1] == '@') {
            has_error = 1;
        }
    } else if (email[i] == '@') {
        at_count++;
        if (i == 0 || i == len - 1 || email[i + 1] == '.' || email[i + 1] == '@') {
            has_error = 1;
        }
    }
    // should also test for allowed characters         
}

if (has_error || at_count != 1) {
    printf("The entered e-mail '%s' does not pass the required tests, Thus it is invalid\n", email);
} else {
    printf("The email '%s' is a valid e-mail address\n", email);
}

Upvotes: 3

xing
xing

Reputation: 2528

Consider strpbrk. Possibly all conditions can be evaluated with one pass through the email.

#include <stdio.h>
#include <string.h>

int main( void) {
    char email[1000] = "";
    char at = '@';
    char dot = '.';
    char *find = NULL;
    char *atfind = NULL;
    char *dotfind = NULL;
    int atfound = 0;

    if ( fgets ( email, sizeof email, stdin)) {
        email[strcspn ( email, "\n")] = 0;//remove trailing newline
        find = email;
        while ( ( find = strpbrk ( find, "@."))) {//find a . or @
            if ( find == email) {
                printf ( "first character cannot be %c\n", *find);
                return 0;
            }
            if ( 0 == *( find + 1)) {
                printf ( "email must not end after %c\n", *find);
                return 0;
            }
            //captures .. @@ .@ @.
            if ( dot == *( find + 1)) {
                printf ( ". cannot follow %c\n", *find);
                return 0;
            }
            if ( at == *( find + 1)) {
                printf ( "@ cannot follow %c\n", *find);
                return 0;
            }
            if ( dot == *( find)) {
                dotfind = find;
            }
            if ( at == *( find)) {
                atfind = find;
                atfound++;
                if ( atfound > 1) {
                    printf ( "multiple @\n");
                    return 0;
                }
            }
            find++;
        }
        if ( !atfind) {
            printf ( "no @\n");
            return 0;
        }
        if ( !dotfind) {
            printf ( "no .\n");
            return 0;
        }
        if ( atfind > dotfind) {
            printf ( "subsequent to @, there must be a .\n");
            return 0;
        }
    }
    else {
        printf ( "problem fgets\n");
        return 0;
    }
    printf ( "good email\n");
    return 0;
}

Upvotes: 0

Related Questions