Reputation: 45
While reviewing my code, my professor said that my use of strstr
and strchr
results in a lot of wasted resources as every and each one of them scans the string.
Can I reduce the amount of functions in a good way?
This code scans a string and based on set parameters decides whether the input is valid or not.
ch1
is '@'
and ch2
is '.'
, (email[i]
) is the string.
for (i = 0; email[i] != 0; i++) {
{
if (strstr(email, "@.") ||
strstr(email, ".@") ||
strstr(email, "..") ||
strstr(email, "@@") ||
email[i] == ch1 ||
email[i] == ch2 ||
email[strlen(email) - 1] == ch1 ||
email[strlen(email) - 1] == ch2) {
printf("The entered e-mail '%s' does not pass the required parameters, Thus it is invalid\n", email);
} else {
printf("The email '%s' is a valid e-mail address\n",email);
}
break;
}
}
This is the snippet I'm talking about.
Should I write my own code that does the checking once? if so, can you give me some pointers in that regards? thank you.
EDIT: Thank you very much for your responses, I did learn of the mistakes in my code and hopefully I learn from them.
Thanks again!
EDIT:2: I want to thank you again for your responses, they have helped me immensely, and I believe that I have written better code
int at_count = 0, dot_count = 0, error1 = 0, error2 = 0;
int i;
size_t length = strlen(email);
int ch1 = '@', ch2 = '.';
for ( i = 0; email[i] != '\0'; i++) /* for loop to count the occurance of the character '@' */
{
if ( email[i] == ch1)
at_count++;
}
for ( i = 0; email[i] != '\0'; i++) /* for loop to count the occurance of the character '.' */
{
if ( email[i] == ch2)
dot_count++;
}
if ( email[0] == ch1 || email[0] == ch2 || email[length-1] == ch1 || email[length-1] == ch2 )
{
error1++;
}
else
{
error1 = 0;
}
if ( strstr(email,".@") || strstr(email, "@.") || strstr(email, "..") || strstr(email, "@@"))
{
error2++;
}
else
{
error2 = 0;
}
if ( (at_count != 1) || (dot_count < 1) || (error1 == 1) || (error2 == 1))
{
printf("The user entered email address '%s' is invalid\n", email);
}
else
{
printf("'%s' is a valid email address\n", email);
}
I feel this is more elegant and simpler code, also more efficient.
My main inspiration was @chqrlie, as I felt his code was very nice and easy to read.
Is there anyway I can improve?
(The email checks are only for practice, don't mind them!)
Thank you very much everyone!
Upvotes: 3
Views: 766
Reputation: 84662
Your professor has a good point about the inefficiency in repetitively scanning characters in email
. Optimally, each character should be scanned only once. Whether you use a for
loop and string indexing (e.g. email[i]
) or simply walk-a-pointer down the email
string is up to you, but you should be locating each character only once. Instead, in your current code you are doing
for
every character in email
, you
email
4-times with strstr
to locate a given substring, and email
2-times with strlen
Think about it. For every character in email
, you are calling strlen
twice which scans forward over the entire contents of email
looking for the nul-terminating character. All four of your strstr
calls are locating two character in differing combinations. You could at minimum scan for one or the other and then check the prior character and the one that follows.
@chqrlie points out additional character combinations and conditions that should be checked for, but since I presume this is a learning exercise rather than something intended for production code, it is enough to be aware that additional criteria are needed to make an e-mail validation routine.
While there is nothing wrong with including string.h
and for longer strings (generally larger than 32-chars), the optimizations in the string.h
function will provide varying degrees of improved efficiency, but there is no need to incur any function call overhead. Regardless what you are looking for in your input, you can always walk down your string with a pointer checking each character and taking the appropriate actions as needed.
A short additional example of that approach to your problem, using the lowly goto
in lieu of a error flag, could look something like the following:
#include <stdio.h>
#define MAXC 1024
int main (void) {
char buf[MAXC] = "", /* buffer to hold email */
*p = buf; /* pointer to buf */
short at = 0; /* counter for '@' */
fputs ("enter e-mail address: ", stdout);
if (fgets (buf, MAXC, stdin) == NULL) { /* read/validate e-mail */
fputs ("(user canceled input)\n", stderr);
return 1;
}
while (*p && *p != '\n') { /* check each character in e-mail */
if (*p == '@') /* count '@' - exactly 1 or fail */
at++;
if (p == buf && (*p == '@' || *p == '.')) /* 1st char '@ or .' */
goto emailerr;
/* '@' followed or preceded by '.' */
if (*p == '@' && (*(p+1) == '.' || (p > buf && *(p-1) == '.')))
goto emailerr;
/* sequential '.' */
if (*p == '.' && (*(p+1) == '.' || (p > buf && *(p-1) == '.')))
goto emailerr;
p++;
} /* last char '@' or '.' */
if (*(p-1) == '@' || *(p-1) == '.' || at != 1)
goto emailerr;
if (*p == '\n') /* trim trailing '\n' (valid case) */
*p = 0;
printf ("The email '%s' is a valid e-mail address\n", buf);
return 0;
emailerr:;
while (*p && *p != '\n') /* locate/trim '\n' (invalid case) */
p++;
if (*p == '\n')
*p = 0;
printf ("The email '%s' is an invalid e-mail address\n", buf);
return 1;
}
As mentioned there are many ways to go about the e-mail validation, and to a large degree you should not focus on "micro optimizations", but instead focus on writing logical code with sound validation. However, as your professor as pointed out, at that same time your logic should not be needlessly repetitive injecting inefficiencies into the code. Writing efficient code takes continual practice. A good way to get that practice is to write sever different versions of your code and then either dump your code to assembly and compare or time/profile your code in operation to get a sense of where inefficiencies may be. Have fun with it.
Look things over and let me know if you have further questions.
Upvotes: 3
Reputation: 145317
Your code indeed has multiple problems:
for (i = 0; email[i] != 0; i++) { // you iterate for each character in the string.
{ //this is a redundant block, remove the extra curly braces
if (strstr(email, "@.") || // this test only needs to be performed once
strstr(email, ".@") || // so does this one
strstr(email, "..") || // so does this one
strstr(email, "@@") || // again...
email[i] == ch1 || // this test is only performed once
email[i] == ch2 || // so is this one
email[strlen(email) - 1] == ch1 || // this test is global
email[strlen(email) - 1] == ch2) { // so is this one
printf("The entered e-mail '%s' does not pass the required parameters, Thus it is invalid\n", email);
} else {
printf("The email '%s' is a valid e-mail address\n", email);
}
break; // you always break from the loop, why have a loop at all?
}
}
You do scan the string 4 times to test the various patterns and another 2 times for strlen()
. It should be possible to perform the same tests in the course of a single scan.
Note also that more problems go unnoticed:
@
presentSome of the tests seem overkill: why refuse ..
before the @
, why refuse a trailing .
before the @
?
Here is a more efficient version:
int at_count = 0;
int has_error = 0;
size_t i, len = strlen(email);
if (len == 0 || email[0] == ch1 || email[0] == ch2 ||
email[len - 1] == ch1 || email[len - 1] == ch2) {
has_error = 1;
}
for (i = 0; !has_error && i < len; i++) {
if (email[i] == '.') {
if (email[i + 1] == '.' || email[i + 1] == '@') {
has_error = 1;
}
} else if (email[i] == '@') {
at_count++;
if (i == 0 || i == len - 1 || email[i + 1] == '.' || email[i + 1] == '@') {
has_error = 1;
}
}
// should also test for allowed characters
}
if (has_error || at_count != 1) {
printf("The entered e-mail '%s' does not pass the required tests, Thus it is invalid\n", email);
} else {
printf("The email '%s' is a valid e-mail address\n", email);
}
Upvotes: 3
Reputation: 2528
Consider strpbrk
. Possibly all conditions can be evaluated with one pass through the email.
#include <stdio.h>
#include <string.h>
int main( void) {
char email[1000] = "";
char at = '@';
char dot = '.';
char *find = NULL;
char *atfind = NULL;
char *dotfind = NULL;
int atfound = 0;
if ( fgets ( email, sizeof email, stdin)) {
email[strcspn ( email, "\n")] = 0;//remove trailing newline
find = email;
while ( ( find = strpbrk ( find, "@."))) {//find a . or @
if ( find == email) {
printf ( "first character cannot be %c\n", *find);
return 0;
}
if ( 0 == *( find + 1)) {
printf ( "email must not end after %c\n", *find);
return 0;
}
//captures .. @@ .@ @.
if ( dot == *( find + 1)) {
printf ( ". cannot follow %c\n", *find);
return 0;
}
if ( at == *( find + 1)) {
printf ( "@ cannot follow %c\n", *find);
return 0;
}
if ( dot == *( find)) {
dotfind = find;
}
if ( at == *( find)) {
atfind = find;
atfound++;
if ( atfound > 1) {
printf ( "multiple @\n");
return 0;
}
}
find++;
}
if ( !atfind) {
printf ( "no @\n");
return 0;
}
if ( !dotfind) {
printf ( "no .\n");
return 0;
}
if ( atfind > dotfind) {
printf ( "subsequent to @, there must be a .\n");
return 0;
}
}
else {
printf ( "problem fgets\n");
return 0;
}
printf ( "good email\n");
return 0;
}
Upvotes: 0