SirRatty
SirRatty

Reputation: 2406

C to PHP, character processing

I have some legacy C code (as a macro) that I am not allowed to change in any way, or replace.

This code (eventually) outputs out a digest (C) string based on the source string, performing an operation on the hash value for each character in the string.

#define DO_HASH(src, dest) { \
    unsigned long hash = 1111; // Seed. You must NOT change this. \
    char c, *srcPtr; \
    int i; \
    unsigned char hashedChar; \
    \
    srcPtr = src; \
    c = *srcPtr++; \
    while ( c) { \
            hash = ((hash << 5) + hash) + c; \
            c = *srcPtr++; \
    } \
    ... // etc.

} // 

Some years back, I had to implement it in PHP, as a function returning a digest string. The PHP function has to reproduce the C results identically.

function php_DO_HASH($srcStr)
{
    $hash = 1111;       // Seed. You must NOT change this.
    $index = 0;
    $c = $srcStr[$index];

    while ($c) {
        $hash = (($hash << 5) + $hash) + ord($c);
        $index++;
        $c = $srcStr[$index];
    }

    ... // etc.
}

This has worked successfully for some years. However, in the last few days my server host upgraded to a new version of CentOS, but says they did not change the version of PHP. Since then, the two codes now generate different output.

Could anyone please advise as to what I'm doing wrong in the PHP version? Thanks.

Upvotes: 2

Views: 193

Answers (4)

caf
caf

Reputation: 239171

You are running into the same PHP overflow problem (where the behaviour varies between versions) as this question. The accepted answer there has all the gory details, including this truncate-to-32-bits function which apparently works on all versions of PHP:

function thirtyTwoBitIntval($value)
{
    if ($value < -2147483648)
    {
        return -(-($value) & 0xffffffff);
    }
    elseif ($value > 2147483647)
    {
        return ($value & 0xffffffff);
    }
    return $value;
}

If you pass your hash value through that thirtyTwoBitIntval() function every time it is recalculated, ie:

hash = thirtyTwoBitIntval(($hash << 5) + $hash + ord($c));

then it should fix the problem.

Upvotes: 0

VolkerK
VolkerK

Reputation: 96159

The while-conditions of your C and PHP version differ.
The C version aborts when there is '\0' character (ord('\0')===0, zero-terminated string) while the php version doesn't. On the other hand the php version will stop at a '0' character (ord('0')===48) while the c version doesn't.

edit: There might also be an issue with value ranges and type conversion. There is no unsigned long type in php. But php converts an integer to a float when the result of an addition is bigger than PHP_INT_MAX. e.g.

var_dump(PHP_INT_MAX);
var_dump(PHP_INT_MAX + 1);

prints (on my 32bit machine)

int(2147483647)
float(2147483648)

I think the next << "fixes" that problem (since php converts the float back to an int in a way that "works" with your algorithm) . But depending on what you're doing with $hash after the loop this could be a problem.

Upvotes: 1

hobbs
hobbs

Reputation: 240264

Perhaps they changed to a 64-bit system? You should try bitanding the hash value with 0xffffffff after each round.

Upvotes: 2

Carl Smotricz
Carl Smotricz

Reputation: 67780

I don't know much about PHP, but I seem to recall you can choose whether array indices start at 0 or 1. It might be worthwhile to check this, and whether this default has changed for your implementation.

I believe there's a variable to set to force this to what you want, though.


Also, the while $c looks to be very literally translated from C. Are you sure there's still a null character at the end of the string to terminate the loop?

Upvotes: 0

Related Questions