user12400828
user12400828

Reputation:

Counting the number of bits of value 1 in an array - Understanding the code behind it (C)

The following code counts the number of bits of value '1' in the array 'buf':

long bit_count(long *arr, int size)
{
    int w,b;
    unsigned long word=0;
    long n = 0;

    for(w = 0; w<size ; w++){
        word = arr[w];
        while(word != 0){
            n += word & 1;
            word >>= 1;
        }
    }
    return n;
}

int main() {

     char buf[] = {"There is always one more!"};         //(1)
     int length = strlen(buf)>>3;                        //(2)
     long *a = (long*) (buf + length);                   //(3)
     //char * b = (char *)a; // this is a comment.       //(4)
     int len = strlen((char*)a);                         //(5)
     long total_count = bit_count((long*) buf, length);  //(6)
     *a = (*a) & (((long)1<<(len*8))-1);                 //(7)
     char *c = (char*)a;                                 //(8)
     total_count += bit_count(a,1);                      //(9)
     printf("bits number %ld\n",total_count);            //(10)

    return 0;
}

That's my understanding of the code: (please correct me if I'm wrong, I want to understand it at the deepest level)

In line (2) they compute the number of letters in buf which is 25 and it's equivalent to 25 bytes. cause the size of char is 1 byte. then by shifting that number 3 bits right they dividing it by 8, 8 defines 8 bytes = 1 long type. so how many "longs" this buf array contains.

line (3) - 'a' points to a value that contains the casting of the remainder of buf to (long *) - in line (4) I saw that b* value will be "!" which is exactly the remainder of 1 byte in the array. but I don't really understand the casting to (long *) the value I'm getting is different if I check

long check = (long) (buf + length);

so what exactly happens when I do - {(long*)buf} is my first question.

in line (6) I count the bits in the first part of the array (by longs) and in line (9) I count the bits in the remainder.

what I don't understand is - what happens in line (7)?

Upvotes: 2

Views: 276

Answers (1)

Eric Postpischil
Eric Postpischil

Reputation: 224586

There is nothing tricky about the code, but it is not very good. Actually, it is very bad, and not because of style but because it corrupts memory.

(2)  int length = strlen(buf)>>3;

This line calculates the whole number of long objects that fit in the string, assuming that long is eight bytes (/ sizeof(long) should be used instead of >>3) and that int is big enough to hold the number (size_t should be used instead of int).

(3)  long *a = (long*) (buf + length);

This seems to intend to set a to point to the fragment at the end of the string that is not long enough to contain another long object. However, because length is a number of long objects and buf acts as a pointer to char, its arithmetic is incorrect. It calculates buf plus length char objects when it intends to calculate buf plus length long objects. It ought to use (long *) buf + length. Also, it assumes that a char * with arbitrary alignment can be sensibly converted to a long *, which is not guaranteed by the C standard.

(5)  int len = strlen((char*)a);

This calculates the number of bytes in the fragment. It is a waste because the code previously calculated the length of the string. It could have saved that length and taken the quotient and remainder modulo the size of long rather than calling strlen again.

(6)  long total_count = bit_count((long*) buf, length);

This calculates the number of bits set in the main portion of the buffer, that is, the portion that is treated as some whole number of long objects. As with the earlier conversion, it assumes that a char * can be converted to a long *, and, further, the routine it calls uses that pointer to read long objects from an object not defined as an array of long, thus violating C aliasing rules.

(7)  *a = (*a) & (((long)1<<(len*8))-1);

Recall that a points to some number of trailing bytes in the string. (long)1<<(len*8) moves a bit to just above that number of bytes, assuming bytes are eight bits (CHAR_BIT should be used instead of 8). (For example, if there are two bytes, this creates the value 1000016.) Then subtracting one creates a bit mask for the bytes (FFFF16 in the example). Then *a attempts to read a long from end of the string, after which applying the mask with & reduces the value read to just the bytes in the string. This not only violates C’s aliasing rules again but also reads outside the memory of buf, which has undefined behavior. Finally, the statement writes the updated value to *a, which is even worse—if you got away with reading outside of buf, now the code is writing outside of buf, causing corruption somewhere unknown.

(8)  char *c = (char*)a;

This line is useless since c is never used.

(9)  total_count += bit_count(a,1);

This counts the number of bits set in the long that was updated in (7).

In the bit_count routine, we find the code is counting bits one-by-one. This is absurd. Violating the C rules about aliasing might have made sense to gain some performance benefit of processing whole long objects instead of individual char objects if the specific C implementation being used allowed the aliasing with unaligned addresses. But this code just does bit-by-bit processing; it does not gain any advantage from some bit-count (also known as population-count) instruction on the processor.

Furthermore, the bit_count routine recognizes that it ought to use unsigned long for the bit count, as it defines word as unsigned long. But it fails to recognize it ought to have used unsigned long throughout, as the long could have a trap representation that would cause the code to fail.

It also defines b without ever using it.

Upvotes: 3

Related Questions