Sergii S.
Sergii S.

Reputation: 449

Haskell. Fastest way to find similar elements in two Data.Map

I am trying to compare elements from two maps with a comparator function. Resulting solution should return a list of key value pairs from the first map that are similar to some values from the second one (only values are being compared). The input looks like following:

let a = fromList [(0,15),(1,150),(2,39),(3,18)]
let b = fromList [(0,151),(1,39),(2,0),(3,1)]

-- Function that performs comparison. Returns True if items are similar
isSimilar :: Int -> Int -> Bool
isSimilar a b = abs (b - a) <= 1 

The result should be following:

result = [(1,150),(2,39)]

I ended up with following implementation:

import qualified Data.IntMap.Strict as M
import Test.Hspec

type Elem = M.IntMap Int


-- Function that does all the work
getSimilarities :: Elem -> Elem -> [(Int,Int)]
getSimilarities cur intersected = M.foldrWithKey (\k p lst -> if M.null (check p)
                                               then lst
                                               else (k, p) : lst) [] cur
    where check pnt = M.filter (isSimilar pnt) intersected

-- Comparator that helps to filter out unnecessary values
isSimilar :: Int -> Int -> Bool
isSimilar a b = abs (b - a) <= 1 

-- Main function that runs the test
main :: IO ()
main = hspec spec

spec :: Spec
spec = do
        it "Get similarities test" $ do
            let xs        = [3,31,0,151,25,120]
            let mp        = M.fromList $ zip [0..(length xs)] xs

            let xs'        = [5,17,32,150,9,90]
            let mp'        = M.fromList $ zip [0..(length xs')] xs'

            let expected   = [(1, 31),(3, 151)]
            getSimilarities mp mp' `shouldBe` expected

However, I do not really like "check" function as I do not need to form a new map there, I just need to check value. Could you please suggest if there is a more efficient and performant way of doing what I need? Thanks.

Upvotes: 1

Views: 348

Answers (1)

Daniel Wagner
Daniel Wagner

Reputation: 153152

If you want to work with any isSimilar function, you can't do asymptotically better than your proposal. But for specific isSimilar functions, it is often possible to do better. In the specific case of

isSimilar a b = abs (b - a) <= 1

you can build a set of all values near your second collection's values and do set intersection, like this:

import Data.Semigroup
import Data.Set (Set)
import qualified Data.Set as S

getSimilarities :: Set (Arg Int Int) -> [Int] -> Set (Arg Int Int)
getSimilarities cur intersected = S.intersection cur . S.fromList $
    [ Arg (i+di) (error "The impossible happened: ignored Arg values were inspected in getSimilarities")
    | i <- intersected
    , di <- [-1, 0, 1]
    ]

See it go in ghci:

> xs = [3,31,0,151,25,120]
> mp = S.fromList $ zipWith Arg xs [0..]
> xs' = [5,17,32,150,9,90]
> getSimilarities mp xs'
fromList [Arg 31 1,Arg 151 3]

I leave deciding whether you actually want to use maps and tuples instead of sets and Args, and functions for converting between them, to the reader.

Upvotes: 1

Related Questions