Why is appending on a preallocated slice quicker than indexing?

Question

I wrote the following Go benchmarks (Go1.9.2, Linux amd64) to test indexing vs appending:

func BenchmarkIndex(b *testing.B) {
    for i := 0; i < b.N; i++ {
        existing := make([]int64, 1000, 1000)
        init := make([]int64, 1000) // len 1000, cap 1000
        for index, element := range existing {
            init[index] = element
        }
    }
}

func BenchmarkAppend(b *testing.B) {
    for i := 0; i < b.N; i++ {
        existing := make([]int64, 1000, 1000)
        init := make([]int64, 0, 1000) // len 0, capacity 1000
        for _, element := range existing {
            init = append(init, element)
        }
    }
}

And got the following results:

goos: linux
goarch: amd64
BenchmarkIndex-4         1000000          2183 ns/op
BenchmarkAppend-4        1000000          1933 ns/op
PASS

I'm a little confused on why append seems to perform better than in general than indexing into the slice. Does it have anything to do with omitting the index inside the range? If not, what is append doing under the hood to make it quicker than just indexing?

For reference, here is append's source. Thanks!

peterSO · Accepted Answer

On computer one:

$ go test same_test.go -run=! -bench=. -benchmem -count=3
goos: linux
goarch: amd64
BenchmarkIndex-4     1000000    1290 ns/op    0 B/op    0 allocs/op
BenchmarkIndex-4     1000000    1290 ns/op    0 B/op    0 allocs/op
BenchmarkIndex-4     1000000    1288 ns/op    0 B/op    0 allocs/op
BenchmarkAppend-4    1000000    1299 ns/op    0 B/op    0 allocs/op
BenchmarkAppend-4    1000000    1302 ns/op    0 B/op    0 allocs/op
BenchmarkAppend-4    1000000    1301 ns/op    0 B/op    0 allocs/op
PASS
ok      command-line-arguments  7.868s
$ go version
go version devel +38c725b148 Sun Nov 5 17:30:11 2017 +0000 linux/amd64
$

The difference between BenchmarkAppend and BenchmarkIndex is:

((1299+1302+1301) - (1290+1290+1288)) * 100 / (1290+1290+1288) =  +0.879%

They are the same.

On computer two:

$ go test same_test.go -run=! -bench=. -benchmem -count=3
goos: linux
goarch: amd64
BenchmarkIndex-8     2000000    706 ns/op    0 B/op    0 allocs/op
BenchmarkIndex-8     2000000    713 ns/op    0 B/op    0 allocs/op
BenchmarkIndex-8     2000000    710 ns/op    0 B/op    0 allocs/op
BenchmarkAppend-8    2000000    711 ns/op    0 B/op    0 allocs/op
BenchmarkAppend-8    2000000    718 ns/op    0 B/op    0 allocs/op
BenchmarkAppend-8    2000000    718 ns/op    0 B/op    0 allocs/op
PASS
ok      command-line-arguments  12.945s
$ go version
go version devel +38c725b148 Sun Nov 5 17:30:11 2017 +0000 linux/amd64
$

The difference between BenchmarkAppend and BenchmarkIndex is:

((711+718+718) - (706+713+710)) * 100 / (706+713+710) =  +0.845%

They are the same.

Let's measure one thing at a time, indexing versus appending over a preallocated slice, which avoids confounding issues like memory management.

package main

import "testing"

func BenchmarkIndexing(b *testing.B) {
    existing := make([]int64, 1000, 1000)
    init := make([]int64, 1000) // len 1000, cap 1000
    b.ReportAllocs()
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        init = init[:cap(init)]
        for index, element := range existing {
            init[index] = element
        }
    }
}

func BenchmarkAppending(b *testing.B) {
    existing := make([]int64, 1000, 1000)
    init := make([]int64, 0, 1000) // len 0, capacity 1000
    b.ReportAllocs()
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        init = init[:0]
        for _, element := range existing {
            init = append(init, element)
        }
    }
}

Output:

$ go test same_test.go -run=! -bench=ing -benchmem -count=3
goos: linux
goarch: amd64
BenchmarkIndexing-4     2000000    736 ns/op    0 B/op    0 allocs/op
BenchmarkIndexing-4     2000000    735 ns/op    0 B/op    0 allocs/op
BenchmarkIndexing-4     2000000    735 ns/op    0 B/op    0 allocs/op
BenchmarkAppending-4    2000000    733 ns/op    0 B/op    0 allocs/op
BenchmarkAppending-4    2000000    733 ns/op    0 B/op    0 allocs/op
BenchmarkAppending-4    2000000    733 ns/op    0 B/op    0 allocs/op
PASS
ok      command-line-arguments  13.303s
$ go version
go version devel +38c725b148 Sun Nov 5 17:30:11 2017 +0000 linux/amd64
$

The difference between BenchmarkAppend and BenchmarkIndex is:

((733+733+733) - (736+735+735)) * 100 / (736+735+735) =  −0.317%

They are the same.

Why is appending on a preallocated slice quicker than indexing?

Answers (1)

Related Questions