Reputation: 809
How to derive a parameterized similarity in a way that it would be convenient to use in Haskell?
The class should be such that the domain can be numeric or text (and possibly something else), and the parameter controlling the internals of comparison function can also be of several types.
Below, you may find the one approach that uses two class parameters. What implications this design entails if the goal is to define several "similarity or equality groups"? (What kind of use cases would be hard to implement compared to some alternative implementation?) In this example, the similarity groups of words could be defined to be edit distances of one, two etc. and in double to be different precisions.
Some of the methods take both numeric and textual inputs like the "quiteSimilar"-method. Why not use just some distance? Some of the similarities should be able to be defined by the user of the parameterized equality, e.g. on text (words) they could be based on synonyms.
And on doubles, well, I don't know yet, what kind of comparisons will be needed. (Suggestions are welcome.) After equalities comes the question, how to compare the order of items so that similar items will be deemed to be equal and not the larger and smaller, see the last line of the output.
{-# LANGUAGE MultiParamTypeClasses #-}
import Data.Array
import qualified Data.Text as T
-- parameterized eq
class Peq a b where peq :: a -> b -> b -> Bool
instance Peq Double Double where peq = almostEqRelPrec
instance Peq Int T.Text where peq = editDistance
class Comment a where
quiteSimilar :: a -> a -> T.Text
instance Comment Double where
quiteSimilar a b = if peq (epsilon * 100::Double) a b then T.pack "alike" else T.pack "unalike"
instance Comment T.Text where
quiteSimilar a b = if peq (1::Int) a b then T.pack "alike" else T.pack "unalike"
x1' x = quiteSimilar 0.25 (0.25 - x * epsilon :: Double)
x1 = quiteSimilar 0.25 (0.25 - 25 * epsilon :: Double)
x2 = quiteSimilar 0.25 (0.25 - 26 * epsilon :: Double)
x3' x = quiteSimilar 1e12 (1e12 - x * ulp 1e12 :: Double)
x3 = quiteSimilar 1e12 (1e12 - 181 * ulp 1e12 :: Double)
x4 = quiteSimilar 1e12 (1e12 - 182 * ulp 1e12 :: Double)
u181 = 181 * ulp 1e12 :: Double
main = do
let a = 0.2 + 0.65 :: Double
b = 0.85 :: Double
s = T.pack "trial"
t = T.pack "tr1al"
putStrLn $ "0.2 + 0.65 = " ++ show a ++ " and compared to " ++ show b ++ ", it is " ++ T.unpack (quiteSimilar a b)
putStrLn $ "Texts " ++ T.unpack s ++ " and " ++ T.unpack t ++ " are " ++ T.unpack (quiteSimilar s t)
putStrLn $ "Note that " ++ show a ++ " > " ++ show b ++ " is " ++ show (a > b)
-- packege Numeric.Limits contains this one
epsilon :: RealFloat a => a
epsilon = r
where r = 1 - encodeFloat (m-1) e
(m, e) = decodeFloat (1 `asTypeOf` r)
ulp :: RealFloat a => a -> a
ulp a = r
where r = a - encodeFloat (m-1) e
(m, e) = decodeFloat (a `asTypeOf` r)
almostEqRelPrec :: (RealFloat a) => a -> a -> a -> Bool
almostEqRelPrec maxRelPrec a b = d <= (largest * maxRelPrec)
where
d = abs $ a - b
largest = max (abs a) (abs b)
editDistance :: Int -> T.Text -> T.Text -> Bool
editDistance i a b = i == editDistance' (show a) (show b)
-- from https://wiki.haskell.org/Edit_distance
-- see also https://hackage.haskell.org/package/edit-distance-0.2.2.1
editDistance' :: Eq a => [a] -> [a] -> Int
editDistance' xs ys = table ! (m,n)
where
(m,n) = (length xs, length ys)
x = array (1,m) (zip [1..] xs)
y = array (1,n) (zip [1..] ys)
table :: Array (Int,Int) Int
table = array bnds [(ij, dist ij) | ij <- range bnds]
bnds = ((0,0),(m,n))
dist (0,j) = j
dist (i,0) = i
dist (i,j) = minimum [table ! (i-1,j) + 1, table ! (i,j-1) + 1,
if x ! i == y ! j then table ! (i-1,j-1) else 1 + table ! (i-1,j-1)]
On my machine, the output is:
0.2 + 0.65 = 0.8500000000000001 and compared to 0.85, it is alike
Texts trial and tr1al are alike
Note that 0.8500000000000001 > 0.85 is True
Edit:
Trying to rephrase the question: could this be achieved more elegantly with a similarity class that has only one parameter a and not two (a and b)? I have a feeling that multiparameter classes may turn out to be difficult later on. Is this a needless fear? First solution along this line that came to my mind was to define similarity class with one parameter a and a class for functions having two parameters. And on instances constraint other type to be similarity class parameter and the other would be for actual method returning Bool.
Are there some benefits of using the latter approach to the one presented? Or actually what are the possible trade-offs between these approaches? And if there are still more ways to make achieve this kind of things, how do they compare?
Upvotes: 0
Views: 121
Reputation: 120711
could this be achieved more elegantly with a similarity class that has only one parameter a and not two (a and b)
Yes. Many MultiParamTypeClasses can be rewritten quite easily to single-param ones... by simply degrading the second parameter to an associated type family:
{-# LANGUAGE TypeFamilies #-}
class Peq b where
type SimilarityThreshold b :: *
peq :: SimilarityThreshold b -> b -> b -> Bool
instance Peq Double where
type SimilarityThreshold Double = Double
peq = almostEqRelPrec
instance Peq T.Text where
type SimilarityThreshold T.Text = Int
peq = editDistance
This is quite a bit more verbose, but indeed I tend to favour this style. The main difference is that the associated type family always assigng each type of values to be compared unambiguously a threshold-type. This can save you some could not deduce... type inference trouble, however it also means that you can't use two different metric-types for a single type (but why would you, anyway).
Note that you can achieve exactly the same semantics by simply adding a fundep to your original class:
{-# LANGUAGE FunctionalDependencies #-}
class Peq a b | b -> a where
peq :: a -> b -> b -> Bool
This is just a bit different in usage – again I tend to favour the type families approach: it is more explicit in what the parameters are for, while at the same time avoiding the second parameter to turn up in the constraints to any Peq
-polymorphic function.
Upvotes: 2