Reputation: 6798
In my kdtree
project I just replaced a depth counter from being Int
-based to an explicit Key a
based on the type a
in KDTree v a
. This is the diff.
Now while I believe this should be a type-level change only my benchmarks show a sharp drop in performance:
Before:
benchmarking nr/kdtree_nr
mean: 60.19084 us, lb 59.87414 us, ub 60.57270 us, ci 0.950
std dev: 1.777527 us, lb 1.494657 us, ub 2.120168 us, ci 0.950
After:
benchmarking nr/kdtree_nr
mean: 556.9518 us, lb 554.0586 us, ub 560.6128 us, ci 0.950
std dev: 16.70620 us, lb 13.58185 us, ub 20.63450 us, ci 0.950
Before I dive into Core ... anyone has any idea what's going on here?
As proposed by Thomas (and userxyz) I replaced data Key a :: *
with type Key a :: *
and changed the implementation accordingly. This hasn't had any significent impact on the result:
benchmarking nr/kdtree_nr
mean: 538.2789 us, lb 537.5128 us, ub 539.4408 us, ci 0.950
std dev: 4.745118 us, lb 3.454081 us, ub 6.969091 us, ci 0.950
Just had a quick look at the Core output. Apparently the change prevents functions depending on the class to be specialized, right?
Before:
lvl20 :: KDTree Vector (V3 Double) -> [V3 Double]
lvl20 =
\ (w4 :: KDTree Vector (V3 Double)) ->
$wpointsAround $fKDCompareV3_$s$fKDCompareV3 lvl2 lvl4 nrRadius q w4
After:
lvl18 :: KDTree Vector (V3 Double) -> [V3 Double]
lvl18 =
\ (w4 :: KDTree Vector (V3 Double)) ->
$wpointsAround $dKDCompare lvl1 lvl3 nrRadius q w4
Small Update to Edit 2: Going crazy with INLINE pragmas doesn't change a thing here.
Quickly implemented what userxyz suggested: http://lpaste.net/104457 Been there before, can't make it to work:
src/Data/KDTree.hs:48:49:
Could not deduce (k ~ KeyV3)
from the context (Real a, Floating a)
bound by the instance declaration at src/Data/KDTree.hs:45:10-49
or from (Key k)
bound by the type signature for
dimDistance :: Key k => k -> V3 a -> V3 a -> Double
at src/Data/KDTree.hs:47:3-13
‘k’ is a rigid type variable bound by
the type signature for
dimDistance :: Key k => k -> V3 a -> V3 a -> Double
at src/Data/KDTree.hs:47:3
Relevant bindings include
k :: k (bound at src/Data/KDTree.hs:47:15)
dimDistance :: k -> V3 a -> V3 a -> Double
(bound at src/Data/KDTree.hs:47:3)
In the pattern: V3X
In a case alternative: V3X -> ax - bx
In the second argument of ‘($)’, namely
‘case k of {
V3X -> ax - bx
V3Y -> ay - by
V3Z -> az - bz }’
Hmm ... I think I just "solved" the problem by just throwing SPECIALIZE pragmas at the functions. This in effect causes everything to be inlined and removes the explicit dictionary passing.
I am not too happy with that solution as this means I have to put a big "please specialize your calls to achieve decent performance" warning in the docs.
Upvotes: 14
Views: 239
Reputation: 6798
By sheer chance I just stumbled upon this question: Transitivity of Auto-Specialization in GHC
There the OP quotes "From the docs for GHC 7.6:" (emphasis mine):
[Y]ou often don't even need the SPECIALIZE pragma in the first place. When compiling a module M, GHC's optimiser (with -O) automatically considers each top-level overloaded function declared in M, and specialises it for the different types at which it is called in M. The optimiser also considers each imported INLINABLE overloaded function, and specialises it for the different types at which it is called in M.
As a result I have just removed all (hard) INLINE and SPECIALIZE pragmas and replaced them with INLINEABLE pragmas where appropriate (ie on each function that is used in the benchmark suite). As a result I get even better times than with inline pragmas on all functions.
Quintessence: let the compiler do it's work but give him a hint sometimes.
Upvotes: 1
Reputation: 14588
This may not be helpful because it doesn't address the real question, which is the code slows down, but you can make this work with type
instead of data
. The reason kFirst
and kSucc
don't work is because there is no way to deduce what a
is from their application, so there is no way to select an instance, since the instance depends only on a
and not Key a
. You can fix this by providing a witness for those functions:
class KDCompare a where
type Key a :: *
kSuccWith :: proxy a -> Key a -> Key a
kFirstWith :: proxy a -> Key a
Then modifying your functions accordingly:
kdtree :: (KDCompare a, G.Vector v a) => BucketSize -> v a -> KDTree v a
kdtree mb vs = ana (kdtreeF mb) (kFirstWith vs, vs)
kdtreeF :: (KDCompare a, G.Vector v a) => BucketSize -> (Key a,v a) -> KDTreeF v a (Key a,v a)
kdtreeF (BucketSize mb) (k0, v0) = go (k0, v0)
where go (k,fs) | G.length fs <= mb = LeafF (G.convert fs)
| otherwise = NodeF k (G.head r) (kSucc' k,l) (kSucc' k,r)
where (l,r) = splitBuckets k fs
kSucc' = kSuccWith v0
It would probably make more sense to just seperate Key
and KDCompare
:
class Enum a => Key a where
kSucc :: a -> a
kFirst :: a
class KDCompare a where
dimDistance :: Key key => key -> a -> a -> Double
realDistance :: a -> a -> Double
Then your datatype has to be parameterised over the key:
data KDTree k v a = Node k a (KDTree k v a) (KDTree k v a)
| Leaf (v a)
data KDTreeF k v a f = NodeF k a f f | LeafF (v a)
deriving (Functor)
But your functions can be written more naturally:
kdtree :: (Key key, KDCompare a, G.Vector v a) => BucketSize -> v a -> KDTree key v a
kdtree mb vs = ana (kdtreeF mb) (kFirst, vs)
Upvotes: 0