Angus
Angus

Reputation: 12621

Understanding the source code of memcpy()

00018 void *memcpy(void *dst, const void *src, size_t len)
00019 {
00020         size_t i;
00021 
00022         /*
00023          * memcpy does not support overlapping buffers, so always do it
00024          * forwards. (Don't change this without adjusting memmove.)
00025          *
00026          * For speedy copying, optimize the common case where both pointers
00027          * and the length are word-aligned, and copy word-at-a-time instead
00028          * of byte-at-a-time. Otherwise, copy by bytes.
00029          *
00030          * The alignment logic below should be portable. We rely on
00031          * the compiler to be reasonably intelligent about optimizing
00032          * the divides and modulos out. Fortunately, it is.
00033          */
00034 
00035         if ((uintptr_t)dst % sizeof(long) == 0 &&
00036             (uintptr_t)src % sizeof(long) == 0 &&
00037             len % sizeof(long) == 0) {
00038 
00039                 long *d = dst;
00040                 const long *s = src;
00041 
00042                 for (i=0; i<len/sizeof(long); i++) {
00043                         d[i] = s[i];
00044                 }
00045         }
00046         else {
00047                 char *d = dst;
00048                 const char *s = src;
00049 
00050                 for (i=0; i<len; i++) {
00051                         d[i] = s[i];
00052                 }
00053         }
00054 
00055         return dst;
00056 }

I was just going through an implementation of memcpy, to understand how it differs from using a loop. But I couldn't see any difference between using a loop rather than memcpy, as memcpy uses loop again internally to copy.

I couldn't understand if part they do for integers — i < len/sizeof(long). Why is this calculation required?

Upvotes: 26

Views: 67941

Answers (6)

Pankaj Suryawanshi
Pankaj Suryawanshi

Reputation: 743

As if you see assembly code of memcpy it show that in 32 bit system each register is 32 bit it can store 4 byte at a time, if you will copy only one byte in 32 bit register, CPU need extra Instruction cycle.

If len/count is aliged in the multiple of 4 , we can copy 4 byte in one cycle

    MOV FROM, R2
    MOV TO,   R3
    MOV R2,   R4
    ADD LEN,  R4
CP: MOV (R2+), (R3+) ; "(Rx+)" means "*Rx++" in C
    CMP R2, R4
    BNE CP

Upvotes: 0

Subham Sarda
Subham Sarda

Reputation: 31

I was just going through an implementation of memcpy, to understand how it differs from using a loop. But I couldn't see any difference between using a loop rather than memcpy, as memcpy uses loop again internally to copy.

Loop (control statements) is one of the basic elements adjacent to if (decision statements) and few other such things. So the question here is not about what is the difference between normal looping and using memcpy.

memcpy just aids your task by providing you with a ready to use API call, instead of having you to write 20 lines of code for a petty thing. If you wish so, you can choose to write your own code to provide you with the same functionality.

Second point as already pointed out earlier is that, the optimization it provides between long data type and other types. Because in long it is copying a block of data at once what we call a word instead of copying byte by byte which would take longer time. In case of long, the same operation that would require 8 iterations to complete, memcpy does it in a single iteration by copying the word at once.

Upvotes: 0

huseyin tugrul buyukisik
huseyin tugrul buyukisik

Reputation: 11910

len%sizeof(long) checks if you are trying to copy full-longs not a part of long.

00035    if ((uintptr_t)dst % sizeof(long) == 0 &&
00036             (uintptr_t)src % sizeof(long) == 0 &&
00037             len % sizeof(long) == 0) {
00038 
00039                 long *d = dst;
00040                 const long *s = src;
00041 
00042                 for (i=0; i<len/sizeof(long); i++) {
00043                         d[i] = s[i];
00044                 }

checks for alignment and if true, copies fast(sizeof(long) bytes at a time).

00046    else {
00047                 char *d = dst;
00048                 const char *s = src;
00049 
00050                 for (i=0; i<len; i++) {
00051                         d[i] = s[i];
00052                 }
00053    }

this is for the mis-aligned arrays (slow copy (1 byte at a time))

Upvotes: 6

m0skit0
m0skit0

Reputation: 25863

to understand how it differs from using a loop. But I couldn't any difference of using a loop rather than memcpy, as memcpy uses loop again internally to copy

Well then it uses a loop. Maybe other implementations of libc doesn't do it like that. Anyway, what's the problem/question if it does use a loop? Also as you see it does more than a loop: it checks for alignment and performs a different kind of loop depending on the alignment.

I couldn't understand if part they do for integers. i < len/sizeof(long). Why is this calculation required ?

This is checking for memory word alignment. If the destination and source addresses are word-aligned, and the length copy is multiple of word-size, then it performs an aligned copy by word (long), which is faster than using bytes (char), not only because of the size, but also because most architectures do word-aligned copies much faster.

Upvotes: 6

Yu Hao
Yu Hao

Reputation: 122363

for (i=0; i<len/sizeof(long); i++) {
    d[i] = s[i];
}

In this for loop, every time a long is copied, there are a total size of len to copy, that's why it needs i<len/sizeof(long) as the condition to terminate the loop.

Upvotes: 4

Andreas Fester
Andreas Fester

Reputation: 36630

I couldn't understand if part they do for integers. i < len/sizeof(long). Why is this calculation required ?

Because they are copying words, not individual bytes, in this case (as the comment says, it is an optimization - it requires less iterations and the CPU can handle word aligned data more efficiently).

len is the number of bytes to copy, and sizeof(long) is the size of a single word, so the number of elements to copy (means, loop iterations to execute) is len / sizeof(long).

Upvotes: 20

Related Questions