Neil Mitchell
Neil Mitchell

Reputation: 9250

Graphing criterion benchmarks taking different orders of magnitude of time

I have a Criterion benchmark where each bgroup corresponds to a test, and within each bgroup there are two bench values of the test with different options. For example:

main = defaultMain
    [bgroup "test1" [bench "v1" test1_1, bench "v2" test1_2]
    ,bgroup "test2" [bench "v1" test2_1, bench "v2" test2_2
    -- lots more tests
    ]

Within each bgroup the two bench tests are comparable. However, test1 takes 2000 micro seconds, while test2 takes 45 micro seconds. The overview graph (which is most useful for what I want to do) displays different tests on the same axes, so I can clearly see the differences in test1, but test2 is hard to see.

Is it possible to normalise each bgroup for plotting? Or show them on separate axes? Or should I dump the CSV data and plot what I want myself?

Upvotes: 7

Views: 347

Answers (2)

Nikita Volkov
Nikita Volkov

Reputation: 43309

I have just released a library criterion-plus. It is a dome library over "criterion", which approaches the issue you're experiencing amongst others. It allows you to declare multiple "standoffs", which generate independent "criterion" report files. Another important issue it fixes is the ability to exclude "setup/teardown" phases from benchmarking, which "criterion" does not let you do.

Here is an example of how this library is supposed to be used:

import CriterionPlus
import qualified SomeMySQLLib as MySQL
import qualified SomePostgreSQLLib as PostgreSQL

main = 
  benchmark $ do
    standoff "Inserting rows" $ do
      subject "MySQL" $ do
        -- Exclude the "setup" phase from measurement:
        pause
        connection <- liftIO $ MySQL.openConnection
        -- Measure what we want:
        continue
        liftIO $ MySQL.insertAThousandRows connection
        -- Exclude the "teardown" phase from measurement:
        pause
        liftIO $ MySQL.closeConnection connection
      subject "PostgreSQL" $ do
        -- This is how we can exclude the "setup" phase from monad transformers:
        pause
        PostgreSQL.runSession $ do
          lift $ continue
          PostgreSQL.insertAThousandRows
          -- Exclude "teardown":
          lift $ pause
    -- Each standoff generates an independent report file:
    standoff "Querying" $ do
      subject "MySQL" $ error "So on..."
      subject "PostgreSQL" $ error "So on..."

Upvotes: 2

Nikita Volkov
Nikita Volkov

Reputation: 43309

This issue is definitely amongst the shortcomings of Criterion. I've been bitten by the same problem multiple times.

The standard approach I take to work around this is just to generate an individual executable per each comparison unit. A special benchmark target has been added in the latest versions of Cabal, so I declare a benchmark target per each comparison unit in the .cabal file. Then I can run each comparison using cabal bench [target-name]. Yeah, it is all far from comforting, but it's the best I could come up with.

Upvotes: 3

Related Questions