Why does introducing associated types kill my performance?

Question

In my kdtree project I just replaced a depth counter from being Int-based to an explicit Key a based on the type a in KDTree v a. This is the diff.

Now while I believe this should be a type-level change only my benchmarks show a sharp drop in performance:

Before:

benchmarking nr/kdtree_nr 
mean: 60.19084 us, lb 59.87414 us, ub 60.57270 us, ci 0.950
std dev: 1.777527 us, lb 1.494657 us, ub 2.120168 us, ci 0.950

After:

benchmarking nr/kdtree_nr 
mean: 556.9518 us, lb 554.0586 us, ub 560.6128 us, ci 0.950 
std dev: 16.70620 us, lb 13.58185 us, ub 20.63450 us, ci 0.950

Before I dive into Core ... anyone has any idea what's going on here?

Edit 1

As proposed by Thomas (and userxyz) I replaced data Key a :: * with type Key a :: * and changed the implementation accordingly. This hasn't had any significent impact on the result:

benchmarking nr/kdtree_nr
mean: 538.2789 us, lb 537.5128 us, ub 539.4408 us, ci 0.950
std dev: 4.745118 us, lb 3.454081 us, ub 6.969091 us, ci 0.950

Edit 2

Just had a quick look at the Core output. Apparently the change prevents functions depending on the class to be specialized, right?

Before:

lvl20 :: KDTree Vector (V3 Double) -> [V3 Double]
lvl20 =
  \ (w4 :: KDTree Vector (V3 Double)) ->
    $wpointsAround $fKDCompareV3_$s$fKDCompareV3 lvl2 lvl4 nrRadius q w4

After:

lvl18 :: KDTree Vector (V3 Double) -> [V3 Double]
lvl18 =
  \ (w4 :: KDTree Vector (V3 Double)) ->
    $wpointsAround $dKDCompare lvl1 lvl3 nrRadius q w4

Small Update to Edit 2: Going crazy with INLINE pragmas doesn't change a thing here.

Edit 3

Quickly implemented what userxyz suggested: http://lpaste.net/104457 Been there before, can't make it to work:

src/Data/KDTree.hs:48:49:
    Could not deduce (k ~ KeyV3)
    from the context (Real a, Floating a)
      bound by the instance declaration at src/Data/KDTree.hs:45:10-49
    or from (Key k)
      bound by the type signature for
                 dimDistance :: Key k => k -> V3 a -> V3 a -> Double
      at src/Data/KDTree.hs:47:3-13
      ‘k’ is a rigid type variable bound by
          the type signature for
            dimDistance :: Key k => k -> V3 a -> V3 a -> Double
          at src/Data/KDTree.hs:47:3
    Relevant bindings include
      k :: k (bound at src/Data/KDTree.hs:47:15)
      dimDistance :: k -> V3 a -> V3 a -> Double
        (bound at src/Data/KDTree.hs:47:3)
    In the pattern: V3X
    In a case alternative: V3X -> ax - bx
    In the second argument of ‘($)’, namely
      ‘case k of {
         V3X -> ax - bx
         V3Y -> ay - by
         V3Z -> az - bz }’

Edit 4

Hmm ... I think I just "solved" the problem by just throwing SPECIALIZE pragmas at the functions. This in effect causes everything to be inlined and removes the explicit dictionary passing.

I am not too happy with that solution as this means I have to put a big "please specialize your calls to achieve decent performance" warning in the docs.

fho · Accepted Answer

By sheer chance I just stumbled upon this question: Transitivity of Auto-Specialization in GHC

There the OP quotes "From the docs for GHC 7.6:" (emphasis mine):

[Y]ou often don't even need the SPECIALIZE pragma in the first place. When compiling a module M, GHC's optimiser (with -O) automatically considers each top-level overloaded function declared in M, and specialises it for the different types at which it is called in M. The optimiser also considers each imported INLINABLE overloaded function, and specialises it for the different types at which it is called in M.

As a result I have just removed all (hard) INLINE and SPECIALIZE pragmas and replaced them with INLINEABLE pragmas where appropriate (ie on each function that is used in the benchmark suite). As a result I get even better times than with inline pragmas on all functions.

Quintessence: let the compiler do it's work but give him a hint sometimes.

Why does introducing associated types kill my performance?

Edit 1

Edit 2

Edit 3

Edit 4

Answers (2)

Related Questions