Reputation: 20120
In an effort to become better with Haskell, I'm rewriting a small CLI that I developed originally in Python. The CLI mostly makes GET requests to an API and allows for filtering/formatting the JSON result.
I'm finding my Haskell version to be a lot slower than my Python one.
To help narrow down the problem, I excluded all parts of my Haskell code except the fetching of data - essentially, it's this:
import Data.Aeson
import qualified Data.ByteString.Char8 as BC
import Data.List (intercalate)
import Network.HTTP.Simple
...
-- For testing purposes
getUsers :: [Handle] -> IO ()
getUsers hs = do
let handles = BC.pack $ intercalate ";" hs
req <- parseRequest (baseUrl ++ "/user.info")
let request = setRequestQueryString [("handles", Just handles)] $ req
response <- httpJSON request
let (usrs :: Maybe (MyApiResponseType [User])) = getResponseBody response
print usrs
And I'm using the following dependencies:
dependencies:
- base >= 4.7 && < 5
- aeson
- bytestring
- http-conduit
To test this, I timed how long it takes for my Haskell program to retrieve data for a particular user (without any particular formatting). I compared it with my Python version (which formats the data), and Curl (which I piped into jq
to format the data):
I ran each 5 times and took the average of the 3 middle values, excluding the highest and lowest times:
Haskell Python Curl
real: 1017 ms 568 ms 214 ms
user: 1062 ms 367 ms 26 ms
sys: 210 ms 45 ms 10 ms
Ok, so the Haskell version is definitely slower. Next I tried profiling tools to narrow down the cause of the problem.
I profiled the code using an SCC
annotation for the function above:
> stack build --profile
...
> stack exec --profile -- my-cli-exe +RTS -p -sstderr
...
244,904,040 bytes allocated in the heap
27,759,640 bytes copied during GC
5,771,840 bytes maximum residency (6 sample(s))
245,912 bytes maximum slop
28 MiB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 228 colls, 228 par 0.849s 0.212s 0.0009s 0.0185s
Gen 1 6 colls, 5 par 0.090s 0.023s 0.0038s 0.0078s
Parallel GC work balance: 30.54% (serial 0%, perfect 100%)
TASKS: 21 (1 bound, 20 peak workers (20 total), using -N8)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.004s ( 0.003s elapsed)
MUT time 0.881s ( 0.534s elapsed)
GC time 0.939s ( 0.235s elapsed)
RP time 0.000s ( 0.000s elapsed)
PROF time 0.000s ( 0.000s elapsed)
EXIT time 0.010s ( 0.001s elapsed)
Total time 1.833s ( 0.773s elapsed)
Alloc rate 277,931,867 bytes per MUT second
Productivity 48.1% of total user, 69.1% of total elapsed
Seems like a lot of time is being spent in garbage collection.
I looked at the generated .prof
file, which gave this:
COST CENTRE MODULE SRC %time %alloc
>>=.\.ks' Data.ASN1.Get Data/ASN1/Get.hs:104:13-61 10.2 9.8
fromBase64.decode4 Data.Memory.Encoding.Base64 Data/Memory/Encoding/Base64.hs:(299,9)-(309,37) 9.0 12.3
>>=.\ Data.ASN1.Parse Data/ASN1/Parse.hs:(54,9)-(56,43) 5.4 0.7
fromBase64.loop Data.Memory.Encoding.Base64 Data/Memory/Encoding/Base64.hs:(264,9)-(296,45) 4.2 7.4
>>=.\ Data.ASN1.Get Data/ASN1/Get.hs:(104,9)-(105,38) 4.2 3.5
decodeSignedObject.onContainer Data.X509.Signed Data/X509/Signed.hs:(167,9)-(171,30) 3.6 2.9
runParseState.go Data.ASN1.BinaryEncoding.Parse Data/ASN1/BinaryEncoding/Parse.hs:(98,12)-(129,127) 3.0 3.2
getConstructedEndRepr.getEnd Data.ASN1.Stream Data/ASN1/Stream.hs:(37,11)-(41,82) 3.0 12.7
getConstructedEnd Data.ASN1.Stream Data/ASN1/Stream.hs:(23,1)-(28,93) 3.0 7.8
readCertificates Data.X509.CertificateStore Data/X509/CertificateStore.hs:(92,1)-(96,33) 3.0 2.2
fmap.\.ks' Data.ASN1.Get Data/ASN1/Get.hs:88:13-52 1.8 2.2
decodeConstruction Data.ASN1.BinaryEncoding Data/ASN1/BinaryEncoding.hs:(48,1)-(50,66) 1.8 0.0
fmap Data.ASN1.Parse Data/ASN1/Parse.hs:41:5-57 1.8 1.0
concat.loopCopy Data.ByteArray.Methods Data/ByteArray/Methods.hs:(210,5)-(215,28) 1.2 0.4
fromBase64.rset Data.Memory.Encoding.Base64 Data/Memory/Encoding/Base64.hs:(312,9)-(314,53) 1.2 0.0
localTimeParseE.allDigits Data.Hourglass.Format Data/Hourglass/Format.hs:358:9-37 1.2 0.3
getWord8 Data.ASN1.Get Data/ASN1/Get.hs:(200,1)-(204,43) 1.2 0.0
fmap.\ Data.ASN1.Get Data/ASN1/Get.hs:(88,9)-(89,38) 1.2 0.6
runParseState.runGetHeader.\ Data.ASN1.BinaryEncoding.Parse Data/ASN1/BinaryEncoding/Parse.hs:131:44-66 1.2 0.0
mplusEither Data.ASN1.BinaryEncoding.Parse Data/ASN1/BinaryEncoding/Parse.hs:(67,1)-(70,45) 1.2 4.9
getOID.groupOID Data.ASN1.Prim Data/ASN1/Prim.hs:299:9-92 1.2 0.3
getConstructedEndRepr.getEnd.zs Data.ASN1.Stream Data/ASN1/Stream.hs:40:48-73 1.2 0.0
getConstructedEndRepr.getEnd.(...) Data.ASN1.Stream Data/ASN1/Stream.hs:40:48-73 1.2 0.4
getConstructedEnd.(...) Data.ASN1.Stream Data/ASN1/Stream.hs:28:48-80 1.2 0.3
decodeEventASN1Repr.loop Data.ASN1.BinaryEncoding Data/ASN1/BinaryEncoding.hs:(54,11)-(67,69) 1.2 2.5
put Data.ASN1.Parse Data/ASN1/Parse.hs:(72,1)-(74,24) 1.2 0.0
fromASN1 Data.X509.ExtensionRaw Data/X509/ExtensionRaw.hs:(55,5)-(61,71) 1.2 0.0
compare Data.X509.DistinguishedName Data/X509/DistinguishedName.hs:31:23-25 1.2 0.0
putBinaryVersion Network.TLS.Packet Network/TLS/Packet.hs:(109,1)-(110,41) 1.2 0.0
parseLBS.onSuccess Data.ASN1.BinaryEncoding.Parse Data/ASN1/BinaryEncoding/Parse.hs:(147,11)-(149,64) 0.6 1.7
pemParseLBS Data.PEM.Parser Data/PEM/Parser.hs:(92,1)-(97,41) 0.6 1.0
runParseState.terminateAugment Data.ASN1.BinaryEncoding.Parse Data/ASN1/BinaryEncoding/Parse.hs:(87,12)-(93,53) 0.0 1.7
parseOnePEM.getPemContent Data.PEM.Parser Data/PEM/Parser.hs:(56,9)-(64,93) 0.0 1.8
This doesn't seem too bad, and when I scrolled down to functions I had defined and they didn't seem to be taking much time either.
This leads me to believe it's a memory leak problem(?), so I profiled the heap:
stack exec --profile -- my-cli-exe +RTS -h
hp2ps my-cli-exe.hp
open my-cli.exe.ps
So it seems as though lots of space is being allocated on the heap, and then suddenly cleared.
The main issue is, I'm not sure where to go from here. My function is relatively small and is only getting a small JSON response of around 500 bytes. So where could the issue be coming from?
Upvotes: 1
Views: 170
Reputation: 20120
It seemed odd that the performance of a common Haskell library was so slow for me, but somehow this approach solved my concerns:
I found that the performance of my executable was faster when I used stack install
to copy the binaries:
stack install
my-cli-exe
instead of using build
and run
.
Here are the running times again for comparison:
HS (stack install) HS (stack run) Python Curl
real: 373 ms 1017 ms 568 ms 214 ms
user: 222 ms 1062 ms 367 ms 26 ms
sys: 141 ms 210 ms 45 ms 10 ms
Upvotes: 0