Hemmelig
Hemmelig

Reputation: 794

See if number is smaller than 2^32

I am reading a huge chunk of numbers via the stdin and I need to find it, if numbers are in the range of [0,2^32] (probably 2^32-1, I am not sure with that), so it’s also not accepting negative numbers. There are also situations where there is a number beginning with hundreds of nulls, which I need to ignore. I am sure if I am doing something wrong with the data type, at the moment I am using „long“ as data type, because I thought this is always with a maximum of 2^32. so if there is an overflow, I am getting a negative number and I can prove if the long is smaller than 0. But now I realised the size of long is also depending on the system of the computer.

Is there anyone who can tell me which data type and operation I should choose to proof this? The beginning datatype is just a char pointer.

Upvotes: 2

Views: 6106

Answers (5)

Nominal Animal
Nominal Animal

Reputation: 39406

You can use the standard strtoul() function for the [0, 232-1] range (inclusive).

I recommend checking the pointer strtoul() assigns (ends, below) to detect cases like "O" (letter O, not zero) and "1O" (one and letter O, not ten); i.e., that the input really did have a decimal number, and ended with a suitable separator character or end-of-string mark.

For example:

#include <stdlib.h>
#include <inttypes.h>
#include <errno.h>

const char *next_u32(const char *from, uint32_t *to)
{
    const char    *ends = from;
    unsigned long  value;

    if (!from) {
        errno = EINVAL; /* NULL from pointer */
        return NULL;
    }

    /* Parse the number to 'value' */
    errno = 0;
    value = strtoul(from, (char **)ends, 10); /* 10 = decimal input */
    if (errno)
        return NULL;
    if (ends == from) {
        errno = EDOM; /* Not a valid number */
        return NULL;
    }

    /* Is it too large? */
    if (value > 4294967295uL) {
        errno = ERANGE; /* Out of valid range */
        return NULL;
    }

    /* Verify the separator is a space or end-of-string. */
    if (*ends != '\0' && *ends != '\t' && *ends != '\n' &&
        *ends != '\v' && *ends != '\f' && *ends != '\r' &&
        *ends != ' ') {
        errno = EDOM; /* The number was immediately followed by garbage. */
        return NULL;
    }

    /* Accepted. Save value, and return the pointer to the
       first unparsed character. */
    if (to)
        *to = (uint32_t)value;

    return ends;
}

To parse all 32-bit unsigned integers on a line, you can use a loop:

const char *line;  /* Contains the line with decimal integers to parse */
uint32_t    value;

while ((line = next_u32(line, &value))) {
    /* Use 'value' */
}

Standard libraries' strtoul() is not the fastest option, if you are parsing a huge amount of unsigned integers in decimal.

Also, sometimes it is easier to just memory-map the input file, and let the OS handle the paging. In that case, however, there is no end-of-string NUL byte (\0) at the end, so a different interface needs to be used.

For those cases, you can use the following function. It takes and updates the pointer to current position in the contents, but never exceeds the end pointer. It returns zero if the next item in the input is a decimal 32-bit unsigned integer, a nonzero error code otherwise.

#define  NEXT_OK    0
#define  NEXT_END   1
#define  NEXT_BAD  -1
#define  NEXT_OVER -2

static int next_u32(const char **fromptr, const char *end,
                    uint32_t *to)
{
    const char *from;
    uint32_t    val = 0;

    if (!fromptr)
        return NEXT_END;

    from = *fromptr;

    /* Skip whitespace characters, including NUL bytes. */
    while (from < end && (*from == '\0' || *from == '\t' ||
                          *from == '\n' || *from == '\v' ||
                          *from == '\f' || *from == '\r' ||
                          *from == ' '))
        from++;

    /* At end? */
    if (from >= end)
        return NEXT_END;

    /* Skip a leading + sign. */
    if (*from == '+' && end - from > 1)
        from++;

    /* Must be a decimal digit now. */
    if (!(*from >= '0' && *from <= '9'))
        return NEXT_BAD;

    /* Skip leading zeroes. */
    while (from < end && *from == '0')
        from++;

    /* Parse the rest of the decimal number, if any. */
    while (from < end && *from >= '0' && *from <= '9') {
        if ((value > 429496729) || (value == 429496729 && *from >= '6'))
            return NEXT_OVER;
        value = (10 * value) + (*(from++) - '0');
    }

    /* If not at end, check the character. */
    if (from < end && *from != '\0' && *from != '\t' &&
                      *from != '\n' && *from != '\v' &&
                      *from != '\f' && *from != '\r' &&
                      *from != ' ')
        return NEXT_BAD;

    /* Parsed correctly. */
    *fromptr = from;
    if (*to)
        *to = value;

    return NEXT_OK;
}

Upvotes: 2

Stephan Lechner
Stephan Lechner

Reputation: 35154

It's a little bit more tricky than it appears at first sight. The reason is that when you allow arbitrarily long input sequences, you might even exceed the maximum data type available, e.g even 64 bit might be too less.

Depending on which method you use for reading in numbers, an overflow in datatype is detected or not. For example, scanf("%lu",...) may lead to undefined behaviour if the processed integral value does not fit into an unsigned long (cf., for example, this online c11 draft concerning scanf):

10 ... If this object does not have an appropriate type, or if the result of the conversion cannot be represented in the object, the behavior is undefined.

So don't use scanf for arbitrary input.

Function strtoul, in contrast, has a defined overflow behaviour (again from the online c11 draft, concerning strtol):

8) The strtol, strtoll, strtoul, and strtoull functions return the converted value, if any. If no conversion could be performed, zero is returned. If the correct value is outside the range of representable values, LONG_MIN, LONG_MAX, LLONG_MIN, LLONG_MAX, ULONG_MAX, or ULLONG_MAX is returned (according to the return type and sign of the value, if any), and the value of the macro ERANGE is stored in errno.

You can use strtol since it will give you a number with at least 32 bits, and it tells you overflows. If long is 64 bits, you can/need to distinguish general overflow or 32 bit overflow. See the following code illustrating this:

#include <errno.h>
#include <limits.h>
#include <stdio.h>
#include <stdlib.h>

void convert(const char* numStr) {

    errno=0;
    long num = strtol(numStr,NULL,10);

    if (errno == ERANGE){
        printf("numstr %s is out of long's range, which is %ld..%ld\n", numStr, LONG_MIN,LONG_MAX);
    } else if (num < 0) {
        printf("numstr %s is negative.\n", numStr);
    } else if (num > UINT32_MAX) {
        printf("numstr %s is out of 32 bit range, which is 0..%u\n", numStr, UINT32_MAX);
    } else {
        printf("OK; numstr %s is in 32 bit range, which is 0..%u\n", numStr, UINT32_MAX);
    }

}


int main() {

    convert("123456789012345678901234567890");
    convert("-123");
    convert("1234567890123567");
    convert("32452345");
    convert("0000000000000000000000032452345");
}

Output:

numstr 123456789012345678901234567890 is out of long's range, which is -9223372036854775808..9223372036854775807
numstr -123 is negative.
numstr 1234567890123567 is out of 32 bit range, which is 0..4294967295
OK; numstr 32452345 is in 32 bit range, which is 0..4294967295
OK; numstr 0000000000000000000000032452345 is in 32 bit range, which is 0..4294967295

Upvotes: 6

DDS
DDS

Reputation: 2478

I suggest to use int64_t to be sure that your number has same size on every architechture. you'll have 2^32 "positive or null" numbers and anything that is below zero is either a overflow or a negative (only really negative numbers will became positive again). But you can circumvent this by reading sign as character than number as int64_t so anything that is negative will be an overflow (because you catched the '-' sign before)

This is 'PSEUDO-code' to illustrate the process:

char sign = '\0'
uint64_t = 0
read (%c,&sign)
if (c=='-' or c=='+' or isdigit(c))

    if isdigit(c) 
                unget()
    read(%ull, &number) //read as unsigned as only negative are 2-complement
    if number < 0
                // the number is too big

    else
               //number is < 2^32       


else
   // number is "truly" negative

Upvotes: 1

William J Bagshaw
William J Bagshaw

Reputation: 565

You cannot detect overflow of a 32 bit number by using a 32 bit number. If it is unsigned it will always be understood as a number between 0 and 2^32 -1. A overflowing input would still result is a valid out. A signed 32 bit value would be "valid" in the range 0 - 2^31-1. Negative values you would be considered invalid. An over/under flowing input between -2^31 and 0 or 2^31 and 2^32-1 would result in an invalid negative number. However, you may find that numbers above 2^32 would appear as valid again. I suggest you use a signed 64 bit number and treat negative or large numbers as invalid input. This gives you a much greater range of input values that would be filtered correctly. If for example, input was comma limited and the commas had been omitted, there could still be issues. I would suggest the input number as a string should not exceed a limiting length. The length filter should allow strings that represent numbers above 2^32 but should filter out numbers that would be greater than 2^63. Somewhere in the middle. To test the size of a type use "sizeof()". eg sizeof(long), sizeof(long long) etc. However there are usually explicitly sized integers for your platform. For portability use your own types and using a typedef and keep the platform dependent code localised to an include file dedicated to platform dependency only.

Upvotes: 1

lxop
lxop

Reputation: 8615

It depends on the details of your situation. If your application is reading completely arbitary input, then you will want to do some preprocessing on the incoming text. At this point you can detect minus signs (and reject the input) and you can strip leading zeroes.

After that, you can easily check the length of the number string, and if it is more than 10 digits (2^32 = 4294967296 which has 10 digits) then reject the input. If it is shorter than 10 digits, then you know it is in the [0..2^32-1] range.

If it has exactly 10 digits, then you have at least a couple of options:

  • parse it into an integer with more than 32 bits (e.g. a 64-bit type) and directly check if it is < 2^32
  • read it character-by-character, comparing it to the digits of 2^32. This could be easily implemented by doing a string cmp.

Which of these you use depends on what your limitations are.

Upvotes: 2

Related Questions