soramimo
soramimo

Reputation: 1326

pointer arithmetic in C++ using char*

I'm having trouble understanding what the difference between these two code snippets is:

// out is of type char* of size N*D
// N, D are of type int


for (int i=0; i!=N; i++){
    if (i % 1000 == 0){
        std::cout << "i=" << i << std::endl;
    }
    for (int j=0; j!=D; j++) {
        out[i*D + j] = 5;
    }
}

This code runs fine, even for very big data sets (N=100000, D=30000). From what I understand about pointer arithmetic, this should give the same result:

for (int i=0; i!=N; i++){
    if (i % 1000 == 0){
        std::cout << "i=" << i << std::endl;
    }
    char* out2 = &out[i*D];
    for (int j=0; j!=D; j++) {
        out2[j] = 5;
    }
}

However, the latter does not work (it freezes at index 143886 - I think it segfaults, but I'm not 100% sure as I'm not used to developing on windows) for a very big data set and I'm afraid I'm missing something obvious about how pointer arithmetic works. Could it be related to advancing char*?

EDIT: We have now established that the problem was an overflow of the index (i.e. (i*D + j) >= 2^32), so using uint64_t instead of int32_t fixed the problem. What's still unclear to me is why the first above case would run through, while the other one segfaults.

Upvotes: 2

Views: 1937

Answers (3)

TomF
TomF

Reputation: 183

When using N as size of array, why use int? does a negative value of an array has any logical meaning?

what do you mean "doesn't work"?

just think of pointers as addresses in memory and not as 'objects'.

char* 
void*
int*

are all pointers to memory addresses, and so are exactly the same, when are defined or passes into a function.

char * a;
int* b = (char*)a;
void* c = (void*)b;

a == b == c;

The difference is that when accessing a, a[i], the value that is retrieved is the next sizeof(*a) bytes from the address a.

And when using ++ to advance a pointer the address that the pointer is set to is advanced by

sizeof(pointer_type) bytes.

Example:

char* a = 1;
a++;

a is now 2.

((int*)a)++;

a is now 6.

Another thing:

char* a = 10;
char* b = a + 10;

&(a[10]) == b

because in the end

a[10] == *((char*)(a + 10))

so there should not be a problem with array sizes in your example, because the two examples are the same.

EDIT

Now note that there is not a negative memory address so accessing an array with a signed negative value will convert the value to positive.

int a = -5;
char* data;
data[a] == data[MAX_INT - 5]

For that reason it might be that (when using sign values as array sizes!) your two examples will actually not get the same result.

Upvotes: 1

Zac Howland
Zac Howland

Reputation: 15872

Version 1

for (int i=0; i!=N; i++) // i starts at 0 and increments until N.  Note:  If you ever skip N, it will loop forever.  You should do < N or <= N instead
{
    if (i % 1000 == 0) // if i is a multiple of 1000
    {
        std::cout << "i=" << i << std::endl; // print i
    }

    for (int j=0; j!=D; j++) // same as with i, only j is going to D (same problem, should be < or <=)
    {
        out[i*D + j] = 5; // this is a way of faking a 2D array by making a large 1D array and doing the math yourself to offset the placement
    }
}

Version 2

for (int i=0; i!=N; i++) // same as before
{
    if (i % 1000 == 0) // same as before
    {
        std::cout << "i=" << i << std::endl; // same as before
    }

    char* out2 = &out[i*D]; // store the location of out[i*D]
    for (int j=0; j!=D; j++) 
    {
        out2[j] = 5; // set out[i*D+j] = 5;
    }
}

They are doing the same thing, but if out is not large enough, they will both behave in an undefined manner (and likely crash).

Upvotes: -1

Alan Stokes
Alan Stokes

Reputation: 18964

N * D is 3e9; that doesn't fit in a 32 bit int.

Upvotes: 4

Related Questions