Reputation: 195
I have recently been running some numerical codes written in Go on large datasets and have been encountering memory management issues. While attempting to profile the problem, I have measured the memory usage of my program in three different ways: with Go's runtime/pprof
package, with the unix time
utility, and by manually adding up the size of the data that I allocated. These three methods do not give me consistent results.
Below is a simplified version of the code that I am profiling. It allocates several slices, puts values at every index and places each of them inside of a parent slice:
package main
import (
"fmt"
"os"
"runtime/pprof"
"unsafe"
"flag"
)
var mprof = flag.String("mprof", "", "write memory profile to this file")
func main() {
flag.Parse()
N := 1<<15
psSlice := make([][]int64, N)
_ = psSlice
size := 0
for i := 0; i < N; i++ {
ps := make([]int64, 1<<10)
for i := range ps { ps[i] = int64(i) }
psSlice[i] = ps
size += int(unsafe.Sizeof(ps[0])) * len(ps)
}
if *mprof != "" {
f, err := os.Create(*mprof)
if err != nil { panic(err) }
pprof.WriteHeapProfile(f)
f.Close()
}
fmt.Printf("total allocated: %d MB\n", size >> 20)
}
Running this with the command $ time time -f "%M kB" ./mem_test -mprof=out.mprof
results in the output:
total allocated: 256 MB
1141216 kB
real 0m0.150s
user 0m0.031s
sys 0m0.113s
Here the first number, 256 MB, is just the size of the arrays computed from unsafe.Sizeof
and the second number, 1055 MB, is what time
reports. Running the pprof tool results in
(pprof) top1
Total: 108.2 MB
107.8 99.5% 99.5% 107.8 99.5% main.main
These results scale smoothly in the way you would expect them to for slices of smaller or larger lengths.
Why don't these three number line up more closely?
Upvotes: 3
Views: 821
Reputation: 166598
First, you need to provide an error free example. Let's start with the basic numbers. For example,
package main
import (
"fmt"
"runtime"
"unsafe"
)
func WriteMatrix(nm [][]int64) {
for n := range nm {
for m := range nm[n] {
nm[n][m]++
}
}
}
func NewMatrix(n, m int) [][]int64 {
a := make([]int64, n*m)
nm := make([][]int64, n)
lo, hi := 0, m
for i := range nm {
nm[i] = a[lo:hi:hi]
lo, hi = hi, hi+m
}
return nm
}
func MatrixSize(nm [][]int64) int64 {
size := int64(0)
for i := range nm {
size += int64(unsafe.Sizeof(nm[i]))
for j := range nm[i] {
size += int64(unsafe.Sizeof(nm[i][j]))
}
}
return size
}
var nm [][]int64
func main() {
n, m := 1<<15, 1<<10
var ms1, ms2 runtime.MemStats
runtime.ReadMemStats(&ms1)
nm = NewMatrix(n, m)
WriteMatrix(nm)
runtime.ReadMemStats(&ms2)
fmt.Println(runtime.GOARCH, runtime.GOOS)
fmt.Println("Actual: ", ms2.TotalAlloc-ms1.TotalAlloc)
fmt.Println("Estimate:", n*3*8+n*m*8)
fmt.Println("Total: ", ms2.TotalAlloc)
fmt.Println("Size: ", MatrixSize(nm))
// check top VIRT and RES for COMMAND peter
for {
WriteMatrix(nm)
}
}
Output:
$ go build peter.go && /usr/bin/time -f "%M KiB" ./peter amd64 linux Actual: 269221888 Estimate: 269221888 Total: 269240592 Size: 269221888 ^C Command exited with non-zero status 2 265220 KiB $ $ top VIRT 284268 RES 265136 COMMAND peter
Is this what you expected?
See MatrixSize
for the correct way to calculate the memory size.
In the infinite loop that allows us to use the top
command, pin the matrix as resident by updating it.
What results do you get when you run this program?
BUG:
Your result from /usr/bin/time
is 1056992 KiB
which too large by a factor of four. It's a bug in your version of /usr/bin/time
, ru_maxrss
is reported in KBytes not pages. My version of Ubuntu has been patched.
References:
Re: GNU time: incorrect results
time-1.7 counts rusage wrong on Linux
“time” 1.7-24 source package in Ubuntu. ru_maxrss
is reported in KBytes not pages. (Closes: #649402)
#649402 - [PATCH] time overestimates max RSS by a factor of 4 - Debian Bug report logs
Subject: Fix ru_maxrss reporting Author: Richard Kettlewell Bug-Debian: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=649402
--- time-1.7.orig/time.c +++ time-1.7/time.c @@ -392,7 +398,7 @@ ptok ((UL) resp->ru.ru_ixrss) / MSEC_TO_TICKS (v)); break; case 'M': /* Maximum resident set size. */ - fprintf (fp, "%lu", ptok ((UL) resp->ru.ru_maxrss)); + fprintf (fp, "%lu", (UL) resp->ru.ru_maxrss); break; case 'O': /* Outputs. */ fprintf (fp, "%ld", resp->ru.ru_oublock);
Upvotes: 5