Reputation: 5676
In Rust, there are two methods to update the content of a slice from another slice: clone_from_slice()
and copy_from_slice()
. The behavior of these two functions are unsurprising - the first does a clone and expects the type to implement Clone
, while the second does a copy and expects the type to implement Copy
.
However, it surprises me that the documentation for clone_from_slice
says this: "If T
implements Copy
, it can be more performant to use copy_from_slice
." It is surprising that there should be a performance difference here. If T
implements Copy
, then .clone()
is required to be equivalent to copying bits; however since the compiler knows what type T
is, it should be able to figure out if it can do a bitwise copy even if I use clone_from_slice
.
So where does the performance inefficiency arise from?
Upvotes: 3
Views: 3764
Reputation: 8803
TL;DR Please check the source of clone_from_slice, it is visiting all the elements of slice and calling clone
for each, while copy_from_slice directly copies all the bits with memcpy
.
Note : With Rust version 1.52.0, clone_from_slice
implemented via specialization, if you'd call clone_from_slice
with Copy
types it will call copy_from_slice
internally. (reference)
If T implements
Copy
, then.clone()
is required to be equivalent to copying bits
Even if every Copy
type would implement Clone
by default where clone
directly use the copy
; clone_from_slice
will still traverse the slice and do the copy while traversing.
But no this proposition is correct for primitives but not correct for the cases like below:
#[derive(Copy)]
struct X;
impl Clone for X {
fn clone(&self) -> Self {
//do some heavy operation or light(depends on the logic)
X
}
}
While Clone
can be implemented by any logic Copy
types will simply copy bits when duplicating an object.
If T implements
Copy
, it can be more performant to usecopy_from_slice
Important thing is in here, the documentation says "it can be" not "it will be", this brings possibilities like
Clone
implementation can directly use Copy
implementation. For the basic types like primitives, optimizer may directly use memcpy
instead of traversing, then we might accept this proposition as wrong because one will not be performant then other.
Clone
implementation can directly use Copy
implementation. For complex types(the traversing issue above) makes this proposition correct. (I've edit the example from @kmdreko with a bit more complex structure, please check the result from godbolt)
Clone
implementation is custom and it is a Copy
type, this one will make this proposition correct even custom implementation is inexpensive then copy
for the large slices using memcpy
might be more beneficial.
Upvotes: 5