Terry Pang
Terry Pang

Reputation: 369

go make slice is little bit slower than []{1,1,1,1}

i'm working on a program that allocate lots of []int with length 4,3,2
and found using a:=[]{1,1,1} is a little bit fast than a:=make([]int,3) a[0] = 1 a[1]=1 a[2]= 1

my question: why a:=[]{1,1,1} is faster than a:=make([]int,3) a[0] = 1 a[1]=1 a[2]= 1?

func BenchmarkMake(b *testing.B) {
    var array []int
    for i := 0; i < b.N; i++ {
        array = make([]int, 4)
        array[0] = 1
        array[1] = 1
        array[2] = 1
        array[3] = 1
    }
}

func BenchmarkDirect(b *testing.B) {
    var array []int
    for i := 0; i < b.N; i++ {
        array = []int{1, 1, 1, 1}
    }

    array[0] = 1
}

BenchmarkMake-4 50000000 34.3 ns/op
BenchmarkDirect-4 50000000 33.8 ns/op

Upvotes: 1

Views: 134

Answers (1)

Grzegorz Żur
Grzegorz Żur

Reputation: 49171

Let's look at benchmark output of the following code

package main

import "testing"

func BenchmarkMake(b *testing.B) {
    var array []int
    for i := 0; i < b.N; i++ {
        array = make([]int, 4)
        array[0] = 1
        array[1] = 1
        array[2] = 1
        array[3] = 1
    }
}

func BenchmarkDirect(b *testing.B) {
    var array []int
    for i := 0; i < b.N; i++ {
        array = []int{1, 1, 1, 1}
    }
    array[0] = 1
}

func BenchmarkArray(b *testing.B) {
    var array [4]int
    for i := 0; i < b.N; i++ {
        array = [4]int{1, 1, 1, 1}
    }
    array[0] = 1
}

Usually the output looks like that

$ go test -bench . -benchmem -o alloc_test -cpuprofile cpu.prof
goos: linux
goarch: amd64
pkg: test
BenchmarkMake-8         30000000                61.3 ns/op            32 B/op          1 allocs/op
BenchmarkDirect-8       20000000                60.2 ns/op            32 B/op          1 allocs/op
BenchmarkArray-8        1000000000               2.56 ns/op            0 B/op          0 allocs/op
PASS
ok      test    6.003s

The difference is so small that it can be the opposite in some circumstances.

Let's look at the profiling data

$go tool pprof -list 'Benchmark.*' cpu.prof 

ROUTINE ======================== test.BenchmarkMake in /home/grzesiek/go/src/test/alloc_test.go
     260ms      1.59s (flat, cum) 24.84% of Total
         .          .      5:func BenchmarkMake(b *testing.B) {
         .          .      6:   var array []int
      40ms       40ms      7:   for i := 0; i < b.N; i++ {
      50ms      1.38s      8:       array = make([]int, 4)
         .          .      9:       array[0] = 1
     130ms      130ms     10:       array[1] = 1
      20ms       20ms     11:       array[2] = 1
      20ms       20ms     12:       array[3] = 1
         .          .     13:   }
         .          .     14:}
ROUTINE ======================== test.BenchmarkDirect in /home/grzesiek/go/src/test/alloc_test.go
      90ms      1.66s (flat, cum) 25.94% of Total
         .          .     16:func BenchmarkDirect(b *testing.B) {
         .          .     17:   var array []int
      10ms       10ms     18:   for i := 0; i < b.N; i++ {
      80ms      1.65s     19:       array = []int{1, 1, 1, 1}
         .          .     20:   }
         .          .     21:   array[0] = 1
         .          .     22:}
ROUTINE ======================== test.BenchmarkArray in /home/grzesiek/go/src/test/alloc_test.go
     2.86s      2.86s (flat, cum) 44.69% of Total
         .          .     24:func BenchmarkArray(b *testing.B) {
         .          .     25:   var array [4]int
     500ms      500ms     26:   for i := 0; i < b.N; i++ {
     2.36s      2.36s     27:       array = [4]int{1, 1, 1, 1}
         .          .     28:   }
         .          .     29:   array[0] = 1
         .          .     30:}

We can see that assignments takes some time.

To learn why we need to see the assembler code.

$go tool pprof -disasm 'BenchmarkMake' cpu.prof

     .          .     4eda93: MOVQ AX, 0(SP)                             ;alloc_test.go:8
  30ms       30ms     4eda97: MOVQ $0x4, 0x8(SP)                         ;test.BenchmarkMake alloc_test.go:8
     .          .     4edaa0: MOVQ $0x4, 0x10(SP)                       ;alloc_test.go:8
  10ms      1.34s     4edaa9: CALL runtime.makeslice(SB)                 ;test.BenchmarkMake alloc_test.go:8
     .          .     4edaae: MOVQ 0x18(SP), AX                       ;alloc_test.go:8
  10ms       10ms     4edab3: MOVQ 0x20(SP), CX                       ;test.BenchmarkMake alloc_test.go:8
     .          .     4edab8: TESTQ CX, CX                             ;alloc_test.go:9
     .          .     4edabb: JBE 0x4edb0b
     .          .     4edabd: MOVQ $0x1, 0(AX)
 130ms      130ms     4edac4: CMPQ $0x1, CX                           ;test.BenchmarkMake alloc_test.go:10
     .          .     4edac8: JBE 0x4edb04                             ;alloc_test.go:10
     .          .     4edaca: MOVQ $0x1, 0x8(AX)
  20ms       20ms     4edad2: CMPQ $0x2, CX                           ;test.BenchmarkMake alloc_test.go:11
     .          .     4edad6: JBE 0x4edafd                             ;alloc_test.go:11
     .          .     4edad8: MOVQ $0x1, 0x10(AX)
     .          .     4edae0: CMPQ $0x3, CX                           ;alloc_test.go:12
     .          .     4edae4: JA 0x4eda65

We can see that the time is taken by CMPQ command that compares constant with CX register. The CX register is the value copied from stack after call to make. We can deduce that it must be the size of slice while AX holds the reference to an underlying array. You can also see that the first bound check was optimized.

Conclusions

  1. Allocations takes the same time but the assignments costs extra due to the slice size checks (as noticed by Terry Pang).
  2. Using array instead of slice is much more cheaper as it saves allocations.

Why is using array so much cheaper?

In Go the array is basically a chunk of memory of fixed size. The [1]int is basically the same thing as int. You can find more in in Go Slices: usage and internals article.

Upvotes: 1

Related Questions