carloabelli
carloabelli

Reputation: 4349

What optimization is making my rust program so much faster?

Frustrated that I could not solve a Sudoku puzzle, I quickly hacked together a simple recursive backtracking solver:

fn is_allowed(board: &[[u8; 9]; 9], row: usize, col: usize, x: u8) -> bool {
    for i in 0..9 {
        if board[row][i] == x {
            return false;
        }
        if board[i][col] == x {
            return false;
        }
    }

    let r = row - (row % 3);
    let c = col - (col % 3);
    for i in r..(r + 3) {
        for j in c..(c + 3) {
            if board[i][j] == x {
                return false;
            }
        }
    }

    true
}

fn solve(board: &mut [[u8; 9]; 9]) -> bool {
    for i in 0..9 {
        for j in 0..9 {
            if board[i][j] == 0 {
                for x in 1..=9 {
                    if is_allowed(board, i, j, x) {
                        board[i][j] = x;
                        if solve(board) {
                            return true
                        }
                        board[i][j] = 0;
                    }
                }
                return false;
            }
        }
    }
    true
}

fn main() {
    let mut board = [
        [ 0, 0, 8, 0, 0, 4, 0, 0, 0 ],
        [ 0, 0, 0, 0, 0, 0, 0, 0, 7 ],
        [ 0, 0, 6, 0, 0, 0, 0, 1, 0 ],
        [ 0, 0, 0, 0, 0, 0, 5, 0, 9 ],
        [ 0, 0, 0, 6, 0, 0, 0, 0, 0 ],
        [ 0, 2, 0, 8, 1, 0, 0, 0, 0 ],
        [ 9, 4, 0, 0, 0, 0, 0, 0, 0 ],
        [ 0, 0, 0, 0, 0, 0, 1, 8, 0 ],
        [ 0, 0, 7, 0, 0, 5, 0, 0, 0 ],
    ];

    if solve(&mut board) {
        for i in 0..9 {
            println!("{:?}", board[i]);
        }
    } else {
        println!("no solution");
    }
}

When running without optimizations (cargo run), it takes more than 6 minutes to run.

When running with optimizations (cargo run --release), it takes about 7 seconds to run.

What optimization is leading to such a difference?

Upvotes: 3

Views: 516

Answers (1)

rodrigo
rodrigo

Reputation: 98526

It is difficult to be sure without analyzing the generated assembly, but I think that it is related to the combination of these optimizations:

  • Loop unrolling: the compiler knows that each loop runs 8 times, so instead of a loop it compiles the loop body 8 times with the index as a constant. This is why the godbold link by @MatthieuM in the comments above is so long.
  • Range checking: since i, j and k are now constants (in the unrolled loops) and the arrays are of known size, the compiler will omit all range checks.
  • Function inlining.

In fact, each time you write board[i][j]:

  • In debug mode, you are calling two library functions that do checkings and computations.
  • In release mode each instance of such code is just a read or write to fixed offset in the stack.

Upvotes: 3

Related Questions