Reputation: 111
I am loading web pages using simple thread pool, while dynamically loading urls from file. But this small program slowly allocate as much memory as my server has, until omm killer stops it. It looks like resp.Body.Close() doesn't free memory for body text (memory size ~ downloaded pages * avg page size). How can I force golang to free memory allocated for body html text?
package main
import (
"bufio"
"fmt"
"io/ioutil"
"net/http"
"os"
"strings"
"sync"
)
func worker(linkChan chan string, wg *sync.WaitGroup) {
defer wg.Done()
for url := range linkChan {
// Getting body text
resp, err := http.Get(url)
if err != nil {
fmt.Printf("Fail url: %s\n", url)
continue
}
body, err := ioutil.ReadAll(resp.Body)
resp.Body.Close()
if err != nil {
fmt.Printf("Fail url: %s\n", url)
continue
}
// Test page body
has_rem_code := strings.Contains(string(body), "googleadservices.com/pagead/conversion.js")
fmt.Printf("Done url: %s\t%t\n", url, has_rem_code)
}
}
func main() {
// Creating worker pool
lCh := make(chan string, 30)
wg := new(sync.WaitGroup)
for i := 0; i < 30; i++ {
wg.Add(1)
go worker(lCh, wg)
}
// Opening file with urls
file, err := os.Open("./tmp/new.csv")
defer file.Close()
if err != nil {
panic(err)
}
reader := bufio.NewReader(file)
// Processing urls
for href, _, err := reader.ReadLine(); err == nil; href, _, err = reader.ReadLine() {
lCh <- string(href)
}
close(lCh)
wg.Wait()
}
Here is some output from pprof tool:
flat flat% sum% cum cum%
34.63MB 29.39% 29.39% 34.63MB 29.39% bufio.NewReaderSize
30MB 25.46% 54.84% 30MB 25.46% net/http.(*Transport).getIdleConnCh
23.09MB 19.59% 74.44% 23.09MB 19.59% bufio.NewWriter
11.63MB 9.87% 84.30% 11.63MB 9.87% net/http.(*Transport).putIdleConn
6.50MB 5.52% 89.82% 6.50MB 5.52% main.main
Looks like this issue, but it's fixed 2 years ago.
Upvotes: 6
Views: 2879
Reputation: 111
Found the answer in this thread on golang-nuts. http.Transport
saves connections for future reusing in case of request to same host, causing memory bloating in my case (hundreds thousands of different hosts). But disabling KeepAlives totally solves that problem.
Working code:
func worker(linkChan chan string, wg *sync.WaitGroup) {
defer wg.Done()
var transport http.RoundTripper = &http.Transport{
DisableKeepAlives: true,
}
c := &http.Client{Transport: transport}
for url := range linkChan {
// Getting body text
resp, err := c.Get(url)
if err != nil {
fmt.Printf("Fail url: %s\n", url)
continue
}
body, err := ioutil.ReadAll(resp.Body)
resp.Body.Close()
if err != nil {
fmt.Printf("Fail url: %s\n", url)
continue
}
// Test page body
has_rem_code := strings.Contains(string(body), "googleadservices.com/pagead/conversion.js")
fmt.Printf("Done url: %s\t%t\n", url, has_rem_code)
}
}
Upvotes: 5