Farbod Salamat-Zadeh
Farbod Salamat-Zadeh

Reputation: 20120

Improving Haskell performance for small GET requests

In an effort to become better with Haskell, I'm rewriting a small CLI that I developed originally in Python. The CLI mostly makes GET requests to an API and allows for filtering/formatting the JSON result.

I'm finding my Haskell version to be a lot slower than my Python one.

To help narrow down the problem, I excluded all parts of my Haskell code except the fetching of data - essentially, it's this:

import           Data.Aeson
import qualified Data.ByteString.Char8 as BC
import           Data.List (intercalate)
import           Network.HTTP.Simple

...

-- For testing purposes
getUsers :: [Handle] -> IO ()
getUsers hs = do
    let handles = BC.pack $ intercalate ";" hs

    req <- parseRequest (baseUrl ++ "/user.info")
    let request = setRequestQueryString [("handles", Just handles)] $ req

    response <- httpJSON request

    let (usrs :: Maybe (MyApiResponseType [User])) = getResponseBody response
    print usrs

And I'm using the following dependencies:

dependencies:
  - base >= 4.7 && < 5
  - aeson
  - bytestring
  - http-conduit

To test this, I timed how long it takes for my Haskell program to retrieve data for a particular user (without any particular formatting). I compared it with my Python version (which formats the data), and Curl (which I piped into jq to format the data):

I ran each 5 times and took the average of the 3 middle values, excluding the highest and lowest times:

        Haskell   Python    Curl
real:   1017 ms   568 ms    214 ms
user:   1062 ms   367 ms    26 ms
sys:    210 ms    45 ms     10 ms

Ok, so the Haskell version is definitely slower. Next I tried profiling tools to narrow down the cause of the problem.

I profiled the code using an SCC annotation for the function above:

> stack build --profile
...
> stack exec --profile -- my-cli-exe +RTS -p -sstderr
...

     244,904,040 bytes allocated in the heap
      27,759,640 bytes copied during GC
       5,771,840 bytes maximum residency (6 sample(s))
         245,912 bytes maximum slop
              28 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       228 colls,   228 par    0.849s   0.212s     0.0009s    0.0185s
  Gen  1         6 colls,     5 par    0.090s   0.023s     0.0038s    0.0078s

  Parallel GC work balance: 30.54% (serial 0%, perfect 100%)

  TASKS: 21 (1 bound, 20 peak workers (20 total), using -N8)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.004s  (  0.003s elapsed)
  MUT     time    0.881s  (  0.534s elapsed)
  GC      time    0.939s  (  0.235s elapsed)
  RP      time    0.000s  (  0.000s elapsed)
  PROF    time    0.000s  (  0.000s elapsed)
  EXIT    time    0.010s  (  0.001s elapsed)
  Total   time    1.833s  (  0.773s elapsed)

  Alloc rate    277,931,867 bytes per MUT second

  Productivity  48.1% of total user, 69.1% of total elapsed

Seems like a lot of time is being spent in garbage collection.

I looked at the generated .prof file, which gave this:

COST CENTRE                        MODULE                         SRC                                                  %time %alloc

>>=.\.ks'                          Data.ASN1.Get                  Data/ASN1/Get.hs:104:13-61                            10.2    9.8
fromBase64.decode4                 Data.Memory.Encoding.Base64    Data/Memory/Encoding/Base64.hs:(299,9)-(309,37)        9.0   12.3
>>=.\                              Data.ASN1.Parse                Data/ASN1/Parse.hs:(54,9)-(56,43)                      5.4    0.7
fromBase64.loop                    Data.Memory.Encoding.Base64    Data/Memory/Encoding/Base64.hs:(264,9)-(296,45)        4.2    7.4
>>=.\                              Data.ASN1.Get                  Data/ASN1/Get.hs:(104,9)-(105,38)                      4.2    3.5
decodeSignedObject.onContainer     Data.X509.Signed               Data/X509/Signed.hs:(167,9)-(171,30)                   3.6    2.9
runParseState.go                   Data.ASN1.BinaryEncoding.Parse Data/ASN1/BinaryEncoding/Parse.hs:(98,12)-(129,127)    3.0    3.2
getConstructedEndRepr.getEnd       Data.ASN1.Stream               Data/ASN1/Stream.hs:(37,11)-(41,82)                    3.0   12.7
getConstructedEnd                  Data.ASN1.Stream               Data/ASN1/Stream.hs:(23,1)-(28,93)                     3.0    7.8
readCertificates                   Data.X509.CertificateStore     Data/X509/CertificateStore.hs:(92,1)-(96,33)           3.0    2.2
fmap.\.ks'                         Data.ASN1.Get                  Data/ASN1/Get.hs:88:13-52                              1.8    2.2
decodeConstruction                 Data.ASN1.BinaryEncoding       Data/ASN1/BinaryEncoding.hs:(48,1)-(50,66)             1.8    0.0
fmap                               Data.ASN1.Parse                Data/ASN1/Parse.hs:41:5-57                             1.8    1.0
concat.loopCopy                    Data.ByteArray.Methods         Data/ByteArray/Methods.hs:(210,5)-(215,28)             1.2    0.4
fromBase64.rset                    Data.Memory.Encoding.Base64    Data/Memory/Encoding/Base64.hs:(312,9)-(314,53)        1.2    0.0
localTimeParseE.allDigits          Data.Hourglass.Format          Data/Hourglass/Format.hs:358:9-37                      1.2    0.3
getWord8                           Data.ASN1.Get                  Data/ASN1/Get.hs:(200,1)-(204,43)                      1.2    0.0
fmap.\                             Data.ASN1.Get                  Data/ASN1/Get.hs:(88,9)-(89,38)                        1.2    0.6
runParseState.runGetHeader.\       Data.ASN1.BinaryEncoding.Parse Data/ASN1/BinaryEncoding/Parse.hs:131:44-66            1.2    0.0
mplusEither                        Data.ASN1.BinaryEncoding.Parse Data/ASN1/BinaryEncoding/Parse.hs:(67,1)-(70,45)       1.2    4.9
getOID.groupOID                    Data.ASN1.Prim                 Data/ASN1/Prim.hs:299:9-92                             1.2    0.3
getConstructedEndRepr.getEnd.zs    Data.ASN1.Stream               Data/ASN1/Stream.hs:40:48-73                           1.2    0.0
getConstructedEndRepr.getEnd.(...) Data.ASN1.Stream               Data/ASN1/Stream.hs:40:48-73                           1.2    0.4
getConstructedEnd.(...)            Data.ASN1.Stream               Data/ASN1/Stream.hs:28:48-80                           1.2    0.3
decodeEventASN1Repr.loop           Data.ASN1.BinaryEncoding       Data/ASN1/BinaryEncoding.hs:(54,11)-(67,69)            1.2    2.5
put                                Data.ASN1.Parse                Data/ASN1/Parse.hs:(72,1)-(74,24)                      1.2    0.0
fromASN1                           Data.X509.ExtensionRaw         Data/X509/ExtensionRaw.hs:(55,5)-(61,71)               1.2    0.0
compare                            Data.X509.DistinguishedName    Data/X509/DistinguishedName.hs:31:23-25                1.2    0.0
putBinaryVersion                   Network.TLS.Packet             Network/TLS/Packet.hs:(109,1)-(110,41)                 1.2    0.0
parseLBS.onSuccess                 Data.ASN1.BinaryEncoding.Parse Data/ASN1/BinaryEncoding/Parse.hs:(147,11)-(149,64)    0.6    1.7
pemParseLBS                        Data.PEM.Parser                Data/PEM/Parser.hs:(92,1)-(97,41)                      0.6    1.0
runParseState.terminateAugment     Data.ASN1.BinaryEncoding.Parse Data/ASN1/BinaryEncoding/Parse.hs:(87,12)-(93,53)      0.0    1.7
parseOnePEM.getPemContent          Data.PEM.Parser                Data/PEM/Parser.hs:(56,9)-(64,93)                      0.0    1.8

This doesn't seem too bad, and when I scrolled down to functions I had defined and they didn't seem to be taking much time either.

This leads me to believe it's a memory leak problem(?), so I profiled the heap:

stack exec --profile -- my-cli-exe +RTS -h
hp2ps my-cli-exe.hp
open my-cli.exe.ps

heap

So it seems as though lots of space is being allocated on the heap, and then suddenly cleared.


The main issue is, I'm not sure where to go from here. My function is relatively small and is only getting a small JSON response of around 500 bytes. So where could the issue be coming from?

Upvotes: 1

Views: 170

Answers (1)

Farbod Salamat-Zadeh
Farbod Salamat-Zadeh

Reputation: 20120

It seemed odd that the performance of a common Haskell library was so slow for me, but somehow this approach solved my concerns:

I found that the performance of my executable was faster when I used stack install to copy the binaries:

stack install
my-cli-exe

instead of using build and run.

Here are the running times again for comparison:

        HS (stack install)  HS (stack run)   Python    Curl
real:   373 ms              1017 ms          568 ms    214 ms
user:   222 ms              1062 ms          367 ms    26 ms
sys:    141 ms              210 ms           45 ms     10 ms

Upvotes: 0

Related Questions