Reputation: 12621
00018 void *memcpy(void *dst, const void *src, size_t len)
00019 {
00020 size_t i;
00021
00022 /*
00023 * memcpy does not support overlapping buffers, so always do it
00024 * forwards. (Don't change this without adjusting memmove.)
00025 *
00026 * For speedy copying, optimize the common case where both pointers
00027 * and the length are word-aligned, and copy word-at-a-time instead
00028 * of byte-at-a-time. Otherwise, copy by bytes.
00029 *
00030 * The alignment logic below should be portable. We rely on
00031 * the compiler to be reasonably intelligent about optimizing
00032 * the divides and modulos out. Fortunately, it is.
00033 */
00034
00035 if ((uintptr_t)dst % sizeof(long) == 0 &&
00036 (uintptr_t)src % sizeof(long) == 0 &&
00037 len % sizeof(long) == 0) {
00038
00039 long *d = dst;
00040 const long *s = src;
00041
00042 for (i=0; i<len/sizeof(long); i++) {
00043 d[i] = s[i];
00044 }
00045 }
00046 else {
00047 char *d = dst;
00048 const char *s = src;
00049
00050 for (i=0; i<len; i++) {
00051 d[i] = s[i];
00052 }
00053 }
00054
00055 return dst;
00056 }
I was just going through an implementation of memcpy
, to understand how it differs from using a loop. But I couldn't see any difference between using a loop rather than memcpy
, as memcpy
uses loop again internally to copy.
I couldn't understand if
part they do for integers — i < len/sizeof(long)
. Why is this calculation required?
Upvotes: 26
Views: 67941
Reputation: 743
As if you see assembly code of memcpy it show that in 32 bit system each register is 32 bit it can store 4 byte at a time, if you will copy only one byte in 32 bit register, CPU need extra Instruction cycle.
If len/count is aliged in the multiple of 4 , we can copy 4 byte in one cycle
MOV FROM, R2
MOV TO, R3
MOV R2, R4
ADD LEN, R4
CP: MOV (R2+), (R3+) ; "(Rx+)" means "*Rx++" in C
CMP R2, R4
BNE CP
Upvotes: 0
Reputation: 31
I was just going through an implementation of
memcpy
, to understand how it differs from using a loop. But I couldn't see any difference between using a loop rather than memcpy, asmemcpy
uses loop again internally to copy.
Loop (control statements) is one of the basic elements adjacent to if (decision statements) and few other such things. So the question here is not about what is the difference between normal looping and using memcpy
.
memcpy
just aids your task by providing you with a ready to use API call, instead of having you to write 20 lines of code for a petty thing. If you wish so, you can choose to write your own code to provide you with the same functionality.
Second point as already pointed out earlier is that, the optimization it provides between long
data type and other types. Because in long
it is copying a block of data at once what we call a word instead of copying byte by byte which would take longer time. In case of long, the same operation that would require 8 iterations to complete, memcpy
does it in a single iteration by copying the word at once.
Upvotes: 0
Reputation: 11910
len%sizeof(long)
checks if you are trying to copy full-longs not a part of long
.
00035 if ((uintptr_t)dst % sizeof(long) == 0 &&
00036 (uintptr_t)src % sizeof(long) == 0 &&
00037 len % sizeof(long) == 0) {
00038
00039 long *d = dst;
00040 const long *s = src;
00041
00042 for (i=0; i<len/sizeof(long); i++) {
00043 d[i] = s[i];
00044 }
checks for alignment and if true, copies fast(sizeof(long)
bytes at a time).
00046 else {
00047 char *d = dst;
00048 const char *s = src;
00049
00050 for (i=0; i<len; i++) {
00051 d[i] = s[i];
00052 }
00053 }
this is for the mis-aligned arrays (slow copy (1 byte at a time))
Upvotes: 6
Reputation: 25863
to understand how it differs from using a loop. But I couldn't any difference of using a loop rather than memcpy, as memcpy uses loop again internally to copy
Well then it uses a loop. Maybe other implementations of libc doesn't do it like that. Anyway, what's the problem/question if it does use a loop? Also as you see it does more than a loop: it checks for alignment and performs a different kind of loop depending on the alignment.
I couldn't understand if part they do for integers. i < len/sizeof(long). Why is this calculation required ?
This is checking for memory word alignment. If the destination and source addresses are word-aligned, and the length copy is multiple of word-size, then it performs an aligned copy by word (long
), which is faster than using bytes (char
), not only because of the size, but also because most architectures do word-aligned copies much faster.
Upvotes: 6
Reputation: 122363
for (i=0; i<len/sizeof(long); i++) {
d[i] = s[i];
}
In this for loop, every time a long
is copied, there are a total size of len
to copy, that's why it needs i<len/sizeof(long)
as the condition to terminate the loop.
Upvotes: 4
Reputation: 36630
I couldn't understand if part they do for integers. i < len/sizeof(long). Why is this calculation required ?
Because they are copying words, not individual bytes, in this case (as the comment says, it is an optimization - it requires less iterations and the CPU can handle word aligned data more efficiently).
len
is the number of bytes to copy, and sizeof(long)
is the size of a single word, so the number of elements to copy (means, loop iterations to execute) is len / sizeof(long)
.
Upvotes: 20