rem
rem

Reputation: 893

aeson benchmark space leak (?) on citylots.json

I've been playing around with the Aeson parser's benchmark suite and got some surprising results comparing their strict parser and the lazy one:

To get the benchmark running:

cd aeson; cabal install
cd benchmark; make
./AesonParse_p [blksz] [runs] [path-to-json] +RTS -p -hc

Here are some profiles I got with +RTS turned on:

From the profiles the leak traced back to jstring_ in Data.Aeson.Parser.Internal Anyone knows what's going on?

    Sun Nov 15 13:56 2015 Time and Allocation Profiling Report  (Final)

       AesonParse_p +RTS -p -hc -RTS 105532 700 json-data/jp100.json

    total time  =        3.42 secs   (3417 ticks @ 1000 us, 1 processor)
    total alloc = 4,191,614,072 bytes  (excludes profiling overheads)

COST CENTRE          MODULE                     %time %alloc

jstringbang_         Data.Aeson.Parser.Internal  96.4   93.1
arrayValues          Data.Aeson.Parser.Internal   1.7    1.9
object_'             Data.Aeson.Parser.Internal   1.1    1.9
main.\.\.loop.refill Main                         0.7    3.2


                                                                         individual     inherited
COST CENTRE               MODULE                       no.     entries  %time %alloc   %time %alloc

MAIN                      MAIN                         116           0    0.0    0.0   100.0  100.0
 main                     Main                         233           0    0.0    0.0   100.0  100.0
  main.blkSize            Main                         242           1    0.0    0.0     0.0    0.0
  main.count              Main                         239           1    0.0    0.0     0.0    0.0
  main.\                  Main                         234           1    0.0    0.0   100.0  100.0
   main.\.\               Main                         235           1    0.0    0.0   100.0  100.0
    main.\.\.rate         Main                         252           1    0.0    0.0     0.0    0.0
    main.\.\.loop         Main                         238         701    0.1    0.0   100.0  100.0
     object_'             Data.Aeson.Parser.Internal   245           0    1.1    1.9    99.2   96.8
      jstring_        Data.Aeson.Parser.Internal   247           0   96.4   93.1    98.1   94.9
       array_'            Data.Aeson.Parser.Internal   249           0    0.0    0.0     1.7    1.9
        arrayValues       Data.Aeson.Parser.Internal   251           0    1.7    1.9     1.7    1.9
     main.\.\.loop.refill Main                         241           1    0.7    3.2     0.7    3.2
 CAF                      GHC.IO.FD                    217           0    0.0    0.0     0.0    0.0
 CAF                      GHC.IO.Encoding.Iconv        215           0    0.0    0.0     0.0    0.0
 CAF                      Text.Read.Lex                209           0    0.0    0.0     0.0    0.0
 CAF                      GHC.IO.Handle.FD             202           0    0.0    0.0     0.0    0.0
 CAF                      GHC.Conc.Signal              198           0    0.0    0.0     0.0    0.0
 CAF                      GHC.IO.Handle.Text           191           0    0.0    0.0     0.0    0.0
 CAF                      GHC.IO.Encoding              188           0    0.0    0.0     0.0    0.0
 CAF                      Data.Time.Clock.UTC          129           0    0.0    0.0     0.0    0.0
 CAF                      Data.Aeson.Parser.Internal   124           0    0.0    0.0     0.0    0.0
  array_'                 Data.Aeson.Parser.Internal   248           1    0.0    0.0     0.0    0.0
   arrayValues            Data.Aeson.Parser.Internal   250           1    0.0    0.0     0.0    0.0
  jstringbang_            Data.Aeson.Parser.Internal   246           1    0.0    0.0     0.0    0.0
  object_'                Data.Aeson.Parser.Internal   244           1    0.0    0.0     0.0    0.0
 CAF                      Main                         123           0    0.0    0.0     0.0    0.0
  main                    Main                         232           1    0.0    0.0     0.0    0.0
   main.blkSize           Main                         243           0    0.0    0.0     0.0    0.0
   main.count             Main                         240           0    0.0    0.0     0.0    0.0
   main.\                 Main                         236           0    0.0    0.0     0.0    0.0
    main.\.\              Main                         237           0    0.0    0.0     0.0    0.0

enter image description here

Upvotes: 1

Views: 153

Answers (1)

ErikR
ErikR

Reputation: 52049

From what I can tell this is should be expected.

The file citylots.json is a 167 MB file consisting of a single Object. The AesonParse program is building the entire object in memory, and that explains the memory ramp in the profile.

By contrast, the files companies.json and enron.json at http://jsonstudio.com/resources/ are "line-oriented" JSON files - each line is a JSON object and there are no commas between the objects. When you run AesonParse on either of these files it is only reading the first line.

Upvotes: 1

Related Questions