Willem
Willem

Reputation: 3253

Go strings.Contains() 2x slower than Python3?

Am converting a text pattern scanner from Python3 to Go1.10, but am surprised it is actually 2 times slower. Upon profiling, the culprit is in strings.Contains(). See the simple benchmarks below. Did I miss anything? Could you recommend a faster pattern search algorithm for Go that would perform better in this case? I'm not bothered about startup time, the same pattern will be used to scan millions of files.

Py3 benchmark:

import time
import re

RUNS = 10000

if __name__ == '__main__':
    with open('data.php') as fh:
        testString = fh.read()

    def do():
        return "576ad4f370014dfb1d0f17b0e6855f22" in testString

    start = time.time()
    for i in range(RUNS):
        _ = do()
    duration = time.time() - start
    print("Python: %.2fs" % duration)

Go1.10 benchmark:

package main

import (
    "fmt"
    "io/ioutil"
    "log"
    "strings"
    "time"
)

const (
    runs = 10000
)

func main() {
    fname := "data.php"
    testdata := readFile(fname)
    needle := "576ad4f370014dfb1d0f17b0e6855f22"
    start := time.Now()

    for i := 0; i < runs; i++ {
        _ = strings.Contains(testdata, needle)

    }
    duration := time.Now().Sub(start)
    fmt.Printf("Go: %.2fs\n", duration.Seconds())
}

func readFile(fname string) string {
    data, err := ioutil.ReadFile(fname)
    if err != nil {
        log.Fatal(err)
    }
    return string(data)
}

data.php is a 528KB file that can be found here.

Output:

Go:     1.98s
Python: 0.84s

Upvotes: 0

Views: 1581

Answers (2)

Willem
Willem

Reputation: 3253

I've done more benchmarking with various string search implementations that I found on Wikipedia, such as:

Benchmark results (code here):

BenchmarkStringsContains-4         10000        208055 ns/op
BenchmarkBMSSearch-4                1000       1856732 ns/op
BenchmarkPaddieKMP-4                2000       1069495 ns/op
BenchmarkKkdaiKMP-4                 1000       1440147 ns/op
BenchmarkAhocorasick-4              2000        935885 ns/op
BenchmarkYara-4                     1000       1237887 ns/op

Then, I benchmarked my practical use case of testing about 1100 signatures (100 regex, 1000 literals) against a 500KB no-match file, for both the native (strings.Contains and regexp) and the C-based Yara implementations:

BenchmarkScanNative-4              2     824328504 ns/op
BenchmarkScanYara-4              300       5338861 ns/op

Even though C calls in Go are supposedly expensive, in these "heavy" operations the profit is remarkable. Side observation: it takes Yara just 5 times as much CPU time to match 1100 signatures instead of 1.

Upvotes: 1

peterSO
peterSO

Reputation: 166626

Why is Python 3 (24.79s) 4.5 times slower than Go (5.47s)? What results do you get?

Python:

$ cat contains.py
import time
import re

RUNS = 10000

if __name__ == '__main__':
    # The Complete Works of William Shakespeare by William Shakespeare
    # http://www.gutenberg.org/files/100/100-0.txt
    file = '/home/peter/shakespeare.100-0.txt' # 'data.php'
    with open(file) as fh:
        testString = fh.read()

    def do():
        return "Means to immure herself and not be seen." in testString

    start = time.time()
    for i in range(RUNS):
        _ = do()
    duration = time.time() - start
    print("Python: %.2fs" % duration)
    print(do())
$ python3 --version
Python 3.6.5
$ python3 contains.py
Python: 24.79s
True
$ 

Go:

$ cat contains.go
package main

import (
    "fmt"
    "io/ioutil"
    "log"
    "strings"
    "time"
)

const (
    runs = 10000
)

func main() {
    // The Complete Works of William Shakespeare by William Shakespeare
    // http://www.gutenberg.org/files/100/100-0.txt
    fname := `/home/peter/shakespeare.100-0.txt` // "data.php"
    testdata := readFile(fname)
    needle := "Means to immure herself and not be seen."
    start := time.Now()

    for i := 0; i < runs; i++ {
        _ = strings.Contains(testdata, needle)

    }
    duration := time.Now().Sub(start)
    fmt.Printf("Go: %.2fs\n", duration.Seconds())

    fmt.Println(strings.Contains(testdata, needle))
    fmt.Println(strings.Index(testdata, needle))

}

func readFile(fname string) string {
    data, err := ioutil.ReadFile(fname)
    if err != nil {
        log.Fatal(err)
    }
    return string(data)
}
$ go version
go version devel +5332b5e75a Tue Jul 31 15:44:37 2018 +0000 linux/amd64
$ go run contains.go
Go: 5.47s
true
5837178
$ 

Upvotes: 2

Related Questions