NoobCoder
NoobCoder

Reputation: 615

Implementation of memset to set a whole word instead of byte by byte in C

So I'm trying to implement my personal MemSet that will do the same as memset but also:

So this is my code:

void *MemSet(void *dest, int c, size_t n)
{
    unsigned char *runner = (unsigned char *)dest;
    
    size_t i = 0;
    
    unsigned char swap_word[sizeof(size_t)];
    
    for (i = 0; i < sizeof(size_t); ++i)
    {
        swap_word[i] = (unsigned char)c;
    }
    
    if (NULL == dest)
    {
        return (NULL);
    }
    
    while (n > 0)
    {
        /* setting byte by byte */
        if (n < sizeof(size_t) || (((size_t)runner & (sizeof(size_t) - 1)) != 0))
        {
            *runner++ = (unsigned char)c;
            --n;
            printf("Byte written\n"); /* for debugging */
        }
        else
        {
            /* setting a whole word */
            *((void **)runner) = *((void **)swap_word);
            runner += sizeof(size_t);
            n -= sizeof(size_t);
            printf("Word written\n"); /* for debugging */
        }
    }
    return (dest);
}

What am I doing here?

and this is my test file:

int array[] = { 2, 3 };
    
int main () 
{
    for (i = 0; i < 2; i++)
    {
        printf("Before MemSet, target is \"%d\"\n\n", array[i]);
    }
    if (NULL == MemSet(array, 3, 2 * sizeof(int)))
    {
        fprintf(stderr,"MemSet failed!\n");
        
    }
    for (i = 0; i < 2; i++)
    {
        printf("After MemSet, target is \"%d\"\n\n", array[i]);
    }
    return (0);
}

Output is:

Before Memset, target is "2"

Before Memset, target is "3"

Word written
After Memset, target is "50529027"

After Memset, target is "50529027"

Why aren't the elements are '3'? both of them? I'm using here

MemSet(array, 3, 2 * sizeof(int))

Which, by theory, needs to set up both of the elements as 3 because the array uses 2*sizeof(int) spaces in the memory, and I set up all of them as 3.

What do you think? And also, how can I check if my alignment works?

Thanks.

Upvotes: 2

Views: 1988

Answers (1)

chqrlie
chqrlie

Reputation: 144949

Your function has multiple problems:

  • you test for word size move at each iteration, which is likely slower than the simple byte operation.

  • *((void * *)runner) = *((void **)swap_word); is incorrect because it violates the aliasing rule and because swap_word might not be correctly aligned for the void * type.

You should run separate loops:

  • the first one to align the destination pointer
  • the second one to set full words, possibly more than one at a time
  • the last one to set the trailing bytes if any

Here is an example:

#include <limits.h>
#include <stdio.h>
#include <stdint.h>

// assuming uintptr_t has no padding bits
void *MemSet(void *dest, int c, size_t n) {
    if (dest != NULL) {
        unsigned char *p = dest;
        if (n >= sizeof(uintptr_t)) {
            // align destination pointer
            // this test is not fully defined but works on all classic targets
            while ((uintptr_t)p & (sizeof(uintptr_t) - 1)) {
                *p++ = (unsigned char)c;
                n--;
            }
            // compute word value (generalized chux formula)
            uintptr_t w = UINTPTR_MAX / UCHAR_MAX * (unsigned char)c;
            // added a redundant (void *) cast to prevent compiler warning
            uintptr_t *pw = (uintptr_t *)(void *)p;
            // set 16 or 32 bytes at a time
            while (n >= 4 * sizeof(uintptr_t)) {
                pw[0] = w;
                pw[1] = w;
                pw[2] = w;
                pw[3] = w;
                pw += 4;
                n -= 4 * sizeof(uintptr_t);
            }
            // set the remaining 0 to 3 words
            while (n >= sizeof(uintptr_t)) {
                *pw++ = w;
                n -= sizeof(uintptr_t);
            }
            p = (unsigned char *)pw;
        }
        // set the trailing bytes
        while (n --> 0) {
            *p++ = (unsigned char)c;
        }
    }
    return dest;
}

Note however that the above code is unlikely to beat memset() because:

  • the compiler may expand the above logic inline for constant sizes, skipping the alignment tests if the destination pointer is known to be aligned or if the CPU allows unaligned access.
  • the library may use specialized instructions such as SIMD or REP/STOS to increase throughput depending on the actual target CPU.

The reason for the surprising results is int spans 4 bytes, each of which gets set to 3, so the resulting value for the integer is 0x03030303, which is exactly 50529027.

Upvotes: 4

Related Questions