anjruu
anjruu

Reputation: 1264

Do copy semantics in Rust literally become a copy in memory?

Let's say I have the following struct in Rust:

struct Num {
  pub num: i32;
}

impl Num {
  pub fn new(x: i32) -> Num {
    Num { num: x }
  }
}

impl Clone for Num {
  fn clone(&self) -> Num {
    Num { num: self.num }
  }
}

impl Copy for Num { }

impl Add<Num> for Num {
  type Output = Num;
  fn add(self, rhs: Num) -> Num {
    Num { num: self.num + rhs.num }
  }
}

And then I have the following code snippet:

let a = Num::new(0);
let b = Num::new(1);
let c = a + b;
let d = a + b;

This works because Num is marked as Copy. Otherwise, the second addition would be a compilation error, since a and b had already been moved into the add function during the first addition (I think).

The question is what the emitted assembly does. When the add function is called, are two copies of the arguments made into the new stack frame, or is the Rust compiler smart enough to know that in this case, it's not necessary to do that copying?

If the Rust compiler isn't smart enough, and actually does the copying like a function with a argument passed by value in C++, how do you avoid the performance overhead in cases where it matters?

The context is I am implementing a matrix class (just to learn), and if I have a 100x100 matrix, I really don't want to be invoking two copies every time I try to do a multiply or add.

Upvotes: 3

Views: 1626

Answers (2)

Shepmaster
Shepmaster

Reputation: 430791

The question is what the emitted assembly does

There's no need to guess; you can just look. Let's use this code:

use std::ops::Add;

#[derive(Copy, Clone, Debug)]
struct Num(i32);

impl Add for Num {
    type Output = Num;

    fn add(self, rhs: Num) -> Num {
        Num(self.0 + rhs.0)
    }
}

#[inline(never)]
fn example() -> Num {
    let a = Num(0);
    let b = Num(1);
    let c = a + b;
    let d = a + b;
    c + d
}

fn main() {
    println!("{:?}", example());
}

Paste it into the Rust Playground, then select the Release mode and view the LLVM IR:

release mode LLVM IR

Search through the result to see the definition of the example function:

; playground::example
; Function Attrs: noinline norecurse nounwind nonlazybind readnone uwtable
define internal fastcc i32 @_ZN10playground7example17h60e923840d8c0cd0E() unnamed_addr #2 {
start:
  ret i32 2
}

That's right, this was completely and totally evaluated at compile time and simplified all the way down to a simple constant. Compilers are pretty good nowadays.

Maybe you want to try something not quite as hardcoded?

#[inline(never)]
fn example(a: Num, b: Num) -> Num {
    let c = a + b;
    let d = a + b;
    c + d
}

fn main() {
    let something = std::env::args().count();
    println!("{:?}", example(Num(something as i32), Num(1)));
}

Produces

; playground::example
; Function Attrs: noinline norecurse nounwind nonlazybind readnone uwtable
define internal fastcc i32 @_ZN10playground7example17h73d4138fe5e9856fE(i32 %a) unnamed_addr #3 {
start:
  %0 = shl i32 %a, 1
  %1 = add i32 %0, 2
  ret i32 %1
}

Oops, the compiler saw that we we basically doing (x + 1) * 2, so it did some tricky optimizations here to get to 2x + 2. Let's try something harder...

#[inline(never)]
fn example(a: Num, b: Num) -> Num {
    a + b
}

fn main() {
    let something = std::env::args().count() as i32;
    let another = std::env::vars().count() as i32;
    println!("{:?}", example(Num(something), Num(another)));
}

Produces

; playground::example
; Function Attrs: noinline norecurse nounwind nonlazybind readnone uwtable
define internal fastcc i32 @_ZN10playground7example17h73d4138fe5e9856fE(i32 %a, i32 %b) unnamed_addr #3 {
start:
  %0 = add i32 %b, %a
  ret i32 %0
}

A simple add instruction.

The real takeaway from this is:

  1. Look at the generated assembly for your cases. Even similar-looking code might optimize differently.
  2. Perform micro and macro benchmarking. You never know exactly how the code will play out in the big picture. Maybe all your cache will be blown, but your micro benchmarks will be stellar.

is the Rust compiler smart enough to know that in this case, it's not necessary to do that copying?

As you just saw, the Rust compiler plus LLVM are pretty smart. In general, it is possible to elide copies when it knows that the operand isn't needed. Whether it will work in your case or not is tough to answer.

Even if it did, you might not want to be passing large items via the stack as it's always possible that it will need to be copied.

And note that you don't have to implement copy for the value, you can choose to only allow it via references:

impl<'a, 'b> Add<&'b Num> for &'a Num {
    type Output = Num;

    fn add(self, rhs: &'b Num) -> Num {
        Num(self.0 + rhs.0)
    }
}

In fact, you may want to implement both ways of adding them, and maybe all 4 permutations of value / reference!

Upvotes: 15

Veedrac
Veedrac

Reputation: 60137

If the Rust compiler isn't smart enough, and actually does the copying like a function with a argument passed by value in C++, how do you avoid the performance overhead in cases where it matters?

The context is I am implementing a matrix class (just to learn), and if I have a 100x100 matrix, I really don't want to be invoking two copies every time I try to do a multiply or add.

All of Rust's implicit copies (be them from moves or actual Copy types) are a shallow memcpy. If you heap allocate, only the pointers and such are copied. Unlike C++, passing a vector by value will only copy three pointer-sized values.

To copy heap memory, an explicit copy must be made, normally by calling .clone(), implemented with #[derive(Clone)] or impl Clone.

I've talked in more detail about this elsewhere.

Shepmaster points out that shallow copies are often messed with a lot by the compiler - generally only heap memory and massive stack values cause problems.

Upvotes: 6

Related Questions