Reputation: 1264
Let's say I have the following struct in Rust:
struct Num {
pub num: i32;
}
impl Num {
pub fn new(x: i32) -> Num {
Num { num: x }
}
}
impl Clone for Num {
fn clone(&self) -> Num {
Num { num: self.num }
}
}
impl Copy for Num { }
impl Add<Num> for Num {
type Output = Num;
fn add(self, rhs: Num) -> Num {
Num { num: self.num + rhs.num }
}
}
And then I have the following code snippet:
let a = Num::new(0);
let b = Num::new(1);
let c = a + b;
let d = a + b;
This works because Num
is marked as Copy
. Otherwise, the second addition would be a compilation error, since a
and b
had already been moved into the add
function during the first addition (I think).
The question is what the emitted assembly does. When the add
function is called, are two copies of the arguments made into the new stack frame, or is the Rust compiler smart enough to know that in this case, it's not necessary to do that copying?
If the Rust compiler isn't smart enough, and actually does the copying like a function with a argument passed by value in C++, how do you avoid the performance overhead in cases where it matters?
The context is I am implementing a matrix class (just to learn), and if I have a 100x100 matrix, I really don't want to be invoking two copies every time I try to do a multiply or add.
Upvotes: 3
Views: 1626
Reputation: 430791
The question is what the emitted assembly does
There's no need to guess; you can just look. Let's use this code:
use std::ops::Add;
#[derive(Copy, Clone, Debug)]
struct Num(i32);
impl Add for Num {
type Output = Num;
fn add(self, rhs: Num) -> Num {
Num(self.0 + rhs.0)
}
}
#[inline(never)]
fn example() -> Num {
let a = Num(0);
let b = Num(1);
let c = a + b;
let d = a + b;
c + d
}
fn main() {
println!("{:?}", example());
}
Paste it into the Rust Playground, then select the Release mode and view the LLVM IR:
Search through the result to see the definition of the example
function:
; playground::example
; Function Attrs: noinline norecurse nounwind nonlazybind readnone uwtable
define internal fastcc i32 @_ZN10playground7example17h60e923840d8c0cd0E() unnamed_addr #2 {
start:
ret i32 2
}
That's right, this was completely and totally evaluated at compile time and simplified all the way down to a simple constant. Compilers are pretty good nowadays.
Maybe you want to try something not quite as hardcoded?
#[inline(never)]
fn example(a: Num, b: Num) -> Num {
let c = a + b;
let d = a + b;
c + d
}
fn main() {
let something = std::env::args().count();
println!("{:?}", example(Num(something as i32), Num(1)));
}
Produces
; playground::example
; Function Attrs: noinline norecurse nounwind nonlazybind readnone uwtable
define internal fastcc i32 @_ZN10playground7example17h73d4138fe5e9856fE(i32 %a) unnamed_addr #3 {
start:
%0 = shl i32 %a, 1
%1 = add i32 %0, 2
ret i32 %1
}
Oops, the compiler saw that we we basically doing (x + 1) * 2, so it did some tricky optimizations here to get to 2x + 2. Let's try something harder...
#[inline(never)]
fn example(a: Num, b: Num) -> Num {
a + b
}
fn main() {
let something = std::env::args().count() as i32;
let another = std::env::vars().count() as i32;
println!("{:?}", example(Num(something), Num(another)));
}
Produces
; playground::example
; Function Attrs: noinline norecurse nounwind nonlazybind readnone uwtable
define internal fastcc i32 @_ZN10playground7example17h73d4138fe5e9856fE(i32 %a, i32 %b) unnamed_addr #3 {
start:
%0 = add i32 %b, %a
ret i32 %0
}
A simple add
instruction.
The real takeaway from this is:
is the Rust compiler smart enough to know that in this case, it's not necessary to do that copying?
As you just saw, the Rust compiler plus LLVM are pretty smart. In general, it is possible to elide copies when it knows that the operand isn't needed. Whether it will work in your case or not is tough to answer.
Even if it did, you might not want to be passing large items via the stack as it's always possible that it will need to be copied.
And note that you don't have to implement copy for the value, you can choose to only allow it via references:
impl<'a, 'b> Add<&'b Num> for &'a Num {
type Output = Num;
fn add(self, rhs: &'b Num) -> Num {
Num(self.0 + rhs.0)
}
}
In fact, you may want to implement both ways of adding them, and maybe all 4 permutations of value / reference!
Upvotes: 15
Reputation: 60137
If the Rust compiler isn't smart enough, and actually does the copying like a function with a argument passed by value in C++, how do you avoid the performance overhead in cases where it matters?
The context is I am implementing a matrix class (just to learn), and if I have a 100x100 matrix, I really don't want to be invoking two copies every time I try to do a multiply or add.
All of Rust's implicit copies (be them from moves or actual Copy
types) are a shallow memcpy
. If you heap allocate, only the pointers and such are copied. Unlike C++, passing a vector by value will only copy three pointer-sized values.
To copy heap memory, an explicit copy must be made, normally by calling .clone()
, implemented with #[derive(Clone)]
or impl Clone
.
I've talked in more detail about this elsewhere.
Shepmaster points out that shallow copies are often messed with a lot by the compiler - generally only heap memory and massive stack values cause problems.
Upvotes: 6