Steven Lu
Steven Lu

Reputation: 43427

What is the implementation of `strtol`?

I'm just curious about this. strtol does not require you to specify the number of bytes to process, so in theory it may be fed a string containing an endless sequence of digits to consume, leading to a denial-of-service attack. Of course, it is easily thwarted by realizing that at once the precision of the long has been exhausted (couldn't really be more than 65 chars of a binary number) there is no point in reading any further.

However, strtol is also required to discard as many whitespace characters as necessary until the first non-whitespace character is encountered. So could it not be attacked with an endless whitespace string even if it is smart about reading digits?

Upvotes: 3

Views: 10909

Answers (4)

Explorer09
Explorer09

Reputation: 724

My personal implementation. I didn't use any lookahead (accessing p[1] or something like that), so theoretically you can convert this to something reading from a stream (e.g. get_long() that calls getc() for characters).

#include <errno.h>
#define LONG_MAX ((long)(~0UL>>1))
#define LONG_MIN (~LONG_MAX)
int isspace(int c); /* <-- Forward declare from <ctype.h> */ 

long strtol(const char *restrict nptr, char **restrict endptr, int base) {
    const char *p = nptr, *endp;
    _Bool is_neg = 0, overflow = 0;
    /* Need unsigned so (-LONG_MIN) can fit in these: */
    unsigned long n = 0UL, cutoff;
    int cutlim;
    if (base < 0 || base == 1 || base > 36) {
#ifdef EINVAL /* errno value defined by POSIX */
        errno = EINVAL;
#endif
        return 0L;
    }
    endp = nptr;
    while (isspace(*p))
        p++;
    if (*p == '+') {
        p++;
    } else if (*p == '-') {
        is_neg = 1, p++;
    }
    if (*p == '0') {
        p++;
        /* For strtol(" 0xZ", &endptr, 16), endptr should point to 'x';
         * pointing to ' ' or '0' is non-compliant.
         * (Many implementations do this wrong.) */
        endp = p;
        if (base == 16 && (*p == 'X' || *p == 'x')) {
            p++;
        } else if (base == 2 && (*p == 'B' || *p == 'b')) {
            /* C23 standard supports "0B" and "0b" prefixes. */
            p++;
        } else if (base == 0) {
            if (*p == 'X' || *p == 'x') {
                base = 16, p++;
            } else if (*p == 'B' || *p == 'b') {
                base = 2, p++;
            } else {
                base = 8;
            }
        }
    } else if (base == 0) {
        base = 10;
    }
    cutoff = (is_neg) ? -(LONG_MIN / base) : LONG_MAX / base;
    cutlim = (is_neg) ? -(LONG_MIN % base) : LONG_MAX % base;
    while (1) {
        int c;
        if (*p >= 'A')
            c = ((*p - 'A') & (~('a' ^ 'A'))) + 10;
        else if (*p <= '9')
            c = *p - '0';
        else
            break;
        if (c < 0 || c >= base) break;
        endp = ++p;
        if (overflow) {
            /* endptr should go forward and point to the non-digit character
             * (of the given base); required by ANSI standard. */
            if (endptr) continue;
            break;
        }
        if (n > cutoff || (n == cutoff && c > cutlim)) {
            overflow = 1; continue;
        }
        n = n * base + c;
    }
    if (endptr) *endptr = (char *)endp;
    if (overflow) {
        errno = ERANGE; return ((is_neg) ? LONG_MIN : LONG_MAX);
    }
    return (long)((is_neg) ? -n : n);
}

Upvotes: 0

user786653
user786653

Reputation: 30460

However, strtol is also required to discard as many whitespace characters as necessary until the first non-whitespace character is encountered. So could it not be attacked with an endless whitespace string even if it is smart about reading digits?

As strtol works on a string already in memory you would have had to store (and read from an attacker) an "endless" amount of whitespace (or forgotten to NUL-terminate your string) before even feeding it to strtol.

Since an implementation can keep calculate the maximum number of digits there can ever be in a valid string it doesn't have to keep going, as you suspect.

DOS attacks can occur with faulty implementations though, check out this related case (this was in java and PHP when reading doubles, but the same could occur in a C or C++ implementation).

Upvotes: 4

edgarmtze
edgarmtze

Reputation: 25048

Well if you want to see the strtol you can see this by University of California

/* 
 * strtol.c --
 *
 *  Source code for the "strtol" library procedure.
 *
 * Copyright (c) 1988 The Regents of the University of California.
 * All rights reserved.
 *
 * Permission is hereby granted, without written agreement and without
 * license or royalty fees, to use, copy, modify, and distribute this
 * software and its documentation for any purpose, provided that the
 * above copyright notice and the following two paragraphs appear in
 * all copies of this software.
 * 
 * IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY FOR
 * DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT
 * OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF THE UNIVERSITY OF
 * CALIFORNIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 *
 * THE UNIVERSITY OF CALIFORNIA SPECIFICALLY DISCLAIMS ANY WARRANTIES,
 * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
 * AND FITNESS FOR A PARTICULAR PURPOSE.  THE SOFTWARE PROVIDED HEREUNDER IS
 * ON AN "AS IS" BASIS, AND THE UNIVERSITY OF CALIFORNIA HAS NO OBLIGATION TO
 * PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
 */
static const char rcsid[] = "$Header$ SPRITE (Berkeley)";

#include <ctype.h>

extern unsigned long int strtoul(char *string, char **endPtr, int base);

/*
 *----------------------------------------------------------------------
 *
 * strtol --
 *
 *  Convert an ASCII string into an integer.
 *
 * Results:
 *  The return value is the integer equivalent of string.  If endPtr
 *  is non-NULL, then *endPtr is filled in with the character
 *  after the last one that was part of the integer.  If string
 *  doesn't contain a valid integer value, then zero is returned
 *  and *endPtr is set to string.
 *
 * Side effects:
 *  None.
 *
 *----------------------------------------------------------------------
 */

long int
strtol(
    char *string,       /* String of ASCII digits, possibly
                 * preceded by white space.  For bases
                 * greater than 10, either lower- or
                 * upper-case digits may be used.
                 */
    char **endPtr,      /* Where to store address of terminating
                 * character, or NULL. */
    int base            /* Base for conversion.  Must be less
                 * than 37.  If 0, then the base is chosen
                 * from the leading characters of string:
                 * "0x" means hex, "0" means octal, anything
                 * else means decimal.
                 */
)
{
    register char *p;
    int result;

    /*
     * Skip any leading blanks.
     */
    p = string;
    while (isspace(*p)) {
    p += 1;
    }

    /*
     * Check for a sign.
     */
    if (*p == '-') {
    p += 1;
    result = -1*(strtoul(p, endPtr, base));
    } else {
    if (*p == '+') {
        p += 1;
    }
    result = strtoul(p, endPtr, base);
    }
    if ((result == 0) && (endPtr != 0) && (*endPtr == p)) {
    *endPtr = string;
    }
    return result;
}

Upvotes: -1

Keith Thompson
Keith Thompson

Reputation: 263267

There is no single implementation of strtol. I doubt that any implementation is susceptible to the kind of attack you describe; the obvious implementation would just traverse the sequence of digits without storing them all at once. (Note that the digit sequence can be arbitrarily long due to leading 0s.)

If you want to see the code for an implementation, you can download the glibc version here; strtol() is in stdlib/strtol.c.

Upvotes: 2

Related Questions