Laith
Laith

Reputation: 81

How to parse sequence of integers from string in C?

Hi I am very new to C and I am having an issue where I am confused on how to parse a string line to a integer. The way I have it so far is just to parse the first string into integer. so If my input is 10 20 30 it will only take the first string and parse it to integer. I am looking for a idea on how to come up with a solution that can read all of the line and parse it all to integer values using getline().

#include <stdio.h>
#include <stdlib.h>

int main(void) {
    char *line = NULL;
    size_t len = 0; 
    int val =0;
    int sum = 0;

    while (getline(&line, &len, stdin) != EOF) {
    
        printf("Line input : %s\n", line);
        //printf("Test %d", val);

        //parse char into integer 
        val = atoi(line);

        printf("Parsed integer: %d\n", val);
    }
    free(line); 
    return 0;
}

Upvotes: 2

Views: 1352

Answers (3)

Vlad
Vlad

Reputation: 2165

There is tradeoff between correctness and comprehensiveness. Therefore I created two versions of program:

  • first with extensive error handling
  • second is simple - assumes only positive scenario (input doesn't contain errors) to stay comprehensive.

Sequence of integers contained in C-string may be parsed invoking in loop standard C function strtol from <stdlib.h>:

long int strtol (const char* str, char** endptr, int base);

that parses the C-string str interpreting its content as an integral number of the specified base. strtol skips white spaces, interprets integer and set pointer *endptr to the first character following the integer.

Since author has variable sum in his code let's demonstrate parsing of sequence of integers as summation of this sequence. I took function sum_ints_from_string() from GNU Manual 20.11.1 Parsing of Integers Code in manual assumes positive scenario. Therefore I changed it for first version.

#include <assert.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int
sum_ints_from_string (char *string)
{
    int sum = 0;

    while (1) {
        char *tail;
        int next;

        /* Skip whitespace by hand, to detect the end.  */
        while (string && isspace (*string)) string++;
        if (!string || *string == 0)
            break;

        /* There is more nonwhitespace,  */
        /* so it ought to be another number.  */
        errno = 0;
        /* Parse it.  */
        next = strtol (string, &tail, 0);
        /* Add it in, if possible.  */
        if (string == tail)
        {
            while (tail && !isspace (*tail)) tail++;
            printf("error: %s\n", strerror(errno));
            printf ("does not have the expected form: %s\n", string);
        }
        else if(errno == 0)
        {
            printf("%d\n", next);
            sum += next;
        }
        else
        {
            printf("error: %s\n", strerror(errno));
            printf ("error: %s\n", string);
        }
        /* Advance past it.  */
        string = tail;
    }

    return sum;
}

int main ()
{
    int sum = 0;
    size_t len = 0;
    char * line;
    FILE *f = fopen("file.txt", "w+");
    assert(f != NULL && "Error opening file");

    const char *text =  "010 0x10 -10 1111111111111111111111111111 0 30 A 10 +5 + 10 30\n"
                        "20 20B 6 ABC - 20 10 0";

    assert(fputs(text, f) > 0 && "error writing to file");
    rewind(f);
    errno = 0;
    while (getline(&line, &len, f) != -1)
    {
        sum += sum_ints_from_string(line);
        printf("%d\n", sum);

        free(line);
        line = NULL;
        len = 0;
    }
    assert(sum == 175);
    return 0;
}

Second version - positive scenario:

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int
sum_ints_from_string (char *string)
{
    int sum = 0;

    while (1) {
        char *tail;
        int next;

        /* Skip whitespace by hand, to detect the end.  */
        while (isspace (*string)) string++;
        if (*string == 0)
            break;

        /* There is more nonwhitespace,  */
        /* so it ought to be another number.  */
        errno = 0;
        /* Parse it.  */
        next = strtol (string, &tail, 0);
        /* Add it in, if not overflow.  */
        if (errno) // returned value is not tested in GNU original
            printf ("Overflow\n");
        else
            sum += next;
        /* Advance past it.  */
        string = tail;
    }

    return sum;
}

int main ()
{
    int sum = 0;
    size_t len = 0;
    char * line;
    while (getline(&line, &len, stdin) != -1)
    {
        sum += sum_ints_from_string(line);
        /*
    `   If line is set to NULL and len is set 0 before the call, then
        getline() will allocate a buffer for storing the line.  This buffer
        should be freed by the user program even if getline() failed.
         */
        free(line);
        line = NULL;
        len = 0;
    }
    return 0;
}

Error checking in version from GNU manual is almost skipped. According to CppReference.com strtol:

Return value

  • If successful, an integer value corresponding to the contents of str is returned.
  • If the converted value falls out of range of corresponding return type, a range error occurs (setting errno to ERANGE) and LONG_MAX, LONG_MIN, LLONG_MAX or LLONG_MIN is returned.
  • If no conversion can be performed, ​0​ is returned.

So for our purpose of summation we are interested only: whether we can add next val or not - we don't need granular and complex error checking here. We have nothing for summation and print error in case: of out of range OR strtol returns 0 (zero return value means: integer equals 0 or conversion cannot be performed). Otherwise we add next.

Upvotes: 0

chqrlie
chqrlie

Reputation: 144780

As you noticed, atoi() can only be used to parse the first value on the line read by getline(), and it has other shortcomings too: if the string does not convert to an integer, the return value will be 0, which is indistinguishable from the case where the string starts with a valid representation of 0.

There are more elaborate functions in <stdlib.h> to convert integers from their representation in different bases (from 2 to 36), detect conversion errors and provide a pointer to the rest of the string: strtol, strtoul, strtoll, strtoull etc.

As noted in comments, getline() is specified as returning the number of bytes read from the file or -1 on error. Do not compare to EOF.

Here is a modified version of your code using the function strtol():

#include <stdio.h>
#include <stdlib.h>

int main(void) {
    char *line = NULL;
    size_t len = 0; 

    while (getline(&line, &len, stdin) >= 0) {
        char *p, *end;
        long sum = 0;
    
        printf("Line input: %s\n", line);

        printf("Parsed integers:");
        for (p = line; *p != '\0'; p = end) {
            long val = strtol(p, &end, 10);
            if (end == p)
               break;
            printf(" %ld", val);
            sum += val;
        }
        printf("\nSum: %ld\n", sum);
        /* check if loop stopped on conversion error or end of string */
        p += strspn(p, " \t\r\n");  /* skip white space */
        if (*p) {
            printf("Invalid input: %s", p);
        }
    }
    free(line); 
    return 0;
}

Notes:

  • getline is not part of the C Standard, it is a POSIX extension, it might not be available on all systems or might have different semantics.
  • strtol() performs range checking: if the converted value exceeds the range of type long, the value returned is either LONG_MIN or LONG_MAX depending on the direction of the overflow and errno is set to ERANGE.
  • sum += val; can also cause an arithmetic overflow.

Upvotes: 0

Jonathan Leffler
Jonathan Leffler

Reputation: 754110

As I noted in comments, it is probably best to use strtol() (or one of the other members of the strtoX() family of functions) to convert the string to integers. Here is code that pays attention to the Correct usage of strtol().

#include <errno.h>
#include <limits.h>
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    char *line = NULL;
    size_t len = 0;

    while (getline(&line, &len, stdin) != -1)
    {
        printf("Line input : [%s]\n", line);
        int val = atoi(line);
        printf("Parsed integer: %d\n", val);

        char *start = line;
        char *eon;
        long value;
        errno = 0;
        while ((value = strtol(start, &eon, 0)),
               eon != start &&
               !((errno == EINVAL && value == 0) ||
                 (errno == ERANGE && (value == LONG_MIN || value == LONG_MAX))))
        {
            printf("%ld\n", value);
            start = eon;
            errno = 0;
        }
        putchar('\n');
    }
    free(line);
    return 0;
}

The code in the question to read lines using POSIX getline() is almost correct; it is legitimate to pass a pointer to a null pointer to the function, and to pass a pointer to 0. However, technically, getline() returns -1 rather than EOF, though there are very few (if any) systems where there is a difference. Nevertheless, standard C allows EOF to be any negative value — it is not required to be -1.

For the extreme nitpickers, although the Linux and macOS man pages for strtol() state "returns 0 and sets errno to EINVAL" when it fails to convert the string, the C standard doesn't require errno is set for that. However, when the conversion fails, eon will be set to start — that is guaranteed by the standard. So, there is room to argue that the part of the test for EINVAL is superfluous.

The while loop uses a comma operator to call strtol() for its side-effects (assigning to value and eon), and ignores the result — and ignoring it is necessary because all possible return values are valid. The other three lines of the condition (the RHS of the comma operator) evaluate whether the conversion was successful. This avoids writing the call to strtol() twice. It's possibly an extreme case of DRY (don't repeat yourself) programming.

Small sample of running the code (program name rn89):

$ rn89
   1  2    4  5       5  6
Line input : [   1  2    4  5       5  6
]
Parsed integer: 1
1
2
4
5
5
6

232443 432435423 12312 1232413r2  
Line input : [232443 432435423 12312 1232413r2
]
Parsed integer: 232443
232443
432435423
12312
1232413

324d
Line input : [324d
]
Parsed integer: 324
324

$

Upvotes: 2

Related Questions