How to define a parameterized similarity class (an ==-like operator with 3rd param) in Haskell?

Question

How to derive a parameterized similarity in a way that it would be convenient to use in Haskell?

The class should be such that the domain can be numeric or text (and possibly something else), and the parameter controlling the internals of comparison function can also be of several types.

Below, you may find the one approach that uses two class parameters. What implications this design entails if the goal is to define several "similarity or equality groups"? (What kind of use cases would be hard to implement compared to some alternative implementation?) In this example, the similarity groups of words could be defined to be edit distances of one, two etc. and in double to be different precisions.

Some of the methods take both numeric and textual inputs like the "quiteSimilar"-method. Why not use just some distance? Some of the similarities should be able to be defined by the user of the parameterized equality, e.g. on text (words) they could be based on synonyms.

And on doubles, well, I don't know yet, what kind of comparisons will be needed. (Suggestions are welcome.) After equalities comes the question, how to compare the order of items so that similar items will be deemed to be equal and not the larger and smaller, see the last line of the output.

{-# LANGUAGE MultiParamTypeClasses #-}

import Data.Array
import qualified Data.Text as T

-- parameterized eq
class Peq a b where peq :: a -> b -> b -> Bool

instance Peq Double Double where peq = almostEqRelPrec
instance Peq Int    T.Text where peq = editDistance 

class Comment a where
  quiteSimilar :: a -> a -> T.Text

instance Comment Double where
  quiteSimilar a b = if peq (epsilon * 100::Double) a b then T.pack "alike" else T.pack "unalike"

instance Comment T.Text where
  quiteSimilar a b = if peq (1::Int) a b then T.pack "alike" else T.pack "unalike"

x1' x = quiteSimilar 0.25 (0.25 - x * epsilon :: Double)
x1 = quiteSimilar 0.25 (0.25 - 25 * epsilon :: Double)
x2 = quiteSimilar 0.25 (0.25 - 26 * epsilon :: Double)
x3' x = quiteSimilar 1e12 (1e12 - x * ulp 1e12 :: Double)
x3 = quiteSimilar 1e12 (1e12 - 181 * ulp 1e12 :: Double)
x4 = quiteSimilar 1e12 (1e12 - 182 * ulp 1e12 :: Double)
u181 = 181 * ulp 1e12 :: Double

main = do
  let a = 0.2 + 0.65 :: Double
      b = 0.85 :: Double
      s = T.pack "trial" 
      t = T.pack "tr1al"

  putStrLn $ "0.2 + 0.65 = " ++ show a ++ " and compared to " ++ show b ++ ", it is " ++ T.unpack (quiteSimilar a b) 
  putStrLn $ "Texts " ++ T.unpack s ++ " and " ++ T.unpack t ++ " are " ++ T.unpack (quiteSimilar s t)
  putStrLn $ "Note that " ++ show a ++ " > " ++ show b ++ " is " ++ show (a > b)

-- packege Numeric.Limits contains this one
epsilon :: RealFloat a => a
epsilon = r
  where r = 1 - encodeFloat (m-1) e
        (m, e) = decodeFloat (1 `asTypeOf` r)

ulp :: RealFloat a => a -> a
ulp a = r
  where r = a - encodeFloat (m-1) e
        (m, e) = decodeFloat (a `asTypeOf` r)

almostEqRelPrec :: (RealFloat a) => a -> a -> a -> Bool
almostEqRelPrec maxRelPrec a b = d <= (largest * maxRelPrec)
  where
    d = abs $ a - b
    largest = max (abs a) (abs b)

editDistance :: Int -> T.Text -> T.Text -> Bool
editDistance i a b = i == editDistance' (show a) (show b)


-- from https://wiki.haskell.org/Edit_distance
-- see also https://hackage.haskell.org/package/edit-distance-0.2.2.1
editDistance' :: Eq a => [a] -> [a] -> Int
editDistance' xs ys = table ! (m,n)
    where
    (m,n) = (length xs, length ys)
    x     = array (1,m) (zip [1..] xs)
    y     = array (1,n) (zip [1..] ys)
    table :: Array (Int,Int) Int
    table = array bnds [(ij, dist ij) | ij <- range bnds]
    bnds  = ((0,0),(m,n))

    dist (0,j) = j
    dist (i,0) = i
    dist (i,j) = minimum [table ! (i-1,j) + 1, table ! (i,j-1) + 1,
        if x ! i == y ! j then table ! (i-1,j-1) else 1 + table ! (i-1,j-1)]

On my machine, the output is:

  0.2 + 0.65 = 0.8500000000000001 and compared to 0.85, it is alike
  Texts trial and tr1al are alike
  Note that 0.8500000000000001 > 0.85 is True

Edit:

Trying to rephrase the question: could this be achieved more elegantly with a similarity class that has only one parameter a and not two (a and b)? I have a feeling that multiparameter classes may turn out to be difficult later on. Is this a needless fear? First solution along this line that came to my mind was to define similarity class with one parameter a and a class for functions having two parameters. And on instances constraint other type to be similarity class parameter and the other would be for actual method returning Bool.

Are there some benefits of using the latter approach to the one presented? Or actually what are the possible trade-offs between these approaches? And if there are still more ways to make achieve this kind of things, how do they compare?

leftaroundabout · Accepted Answer

could this be achieved more elegantly with a similarity class that has only one parameter a and not two (a and b)

Yes. Many MultiParamTypeClasses can be rewritten quite easily to single-param ones... by simply degrading the second parameter to an associated type family:

{-# LANGUAGE TypeFamilies #-}

class Peq b where
  type SimilarityThreshold b :: *
  peq :: SimilarityThreshold b -> b -> b -> Bool

instance Peq Double where
  type SimilarityThreshold Double = Double
  peq = almostEqRelPrec

instance Peq T.Text where
  type SimilarityThreshold T.Text = Int
  peq = editDistance

This is quite a bit more verbose, but indeed I tend to favour this style. The main difference is that the associated type family always assigng each type of values to be compared unambiguously a threshold-type. This can save you some could not deduce... type inference trouble, however it also means that you can't use two different metric-types for a single type (but why would you, anyway).

Note that you can achieve exactly the same semantics by simply adding a fundep to your original class:

{-# LANGUAGE FunctionalDependencies #-}

class Peq a b | b -> a where
  peq :: a -> b -> b -> Bool

This is just a bit different in usage – again I tend to favour the type families approach: it is more explicit in what the parameters are for, while at the same time avoiding the second parameter to turn up in the constraints to any Peq-polymorphic function.

How to define a parameterized similarity class (an ==-like operator with 3rd param) in Haskell?

Answers (1)

Related Questions