cmdv
cmdv

Reputation: 1753

Haskell - Grouping specific nearest neighbours in a cartesian grid

I'm after some direction in this puzzle where I need to group specific nearest neighbours together.

my input data is:

myList :: Map (Int, Int) Int
myList =
  fromList
    [((-2,-2),0),((-2,-1),0),((-2,0),2),((-2,1),0),((-2,2),0)
    ,((-1,-2),1),((-1,-1),3),((-1,0),0),((-1,1),0),((-1,2),1)
    ,((0,-2),0),((0,-1),0),((0,0),0),((0,1),0),((0,2),0)
    ,((1,-2),0),((1,-1),0),((1,0),0),((1,1),2),((1,2),1)
    ,((2,-2),0),((2,-1),2),((2,0),0),((2,1),0),((2,2),0)]

Which is a data representation of this 5 x 5 grid (brown land, blue water):5 x 5 grid I'm using (Int,Int) as XY coordinates, because the way the list had to be generated (thus its ordering) was in a spiral on a cartesian coordinate grid (0,0) being the origin. The remaining Int is size of population 0 being water, 1..9 being land.

Because of the ordering of my Map I've been struggling with finding a way I can traverse my data and return 4 grouped land items that are grouped due to each others connected proximity (including diagonal), so I'm looking for a result like bellow:

[ [(-1 , 2)]
, [(1, 2),(1,1)]
, [(-2, -0),(-1,-1),(-1,-2)]
, [(2, -1)]]

I've researched and tried various algorithm like BFS, Flood Fill but my input data never fit the structural requirements or my understanding of the subjects doesn't allow me to convert it to using coordinates.

Is there a way I can run an algorithm directly on the data, or should I be looking at another direction?

I'm sorry there is no code examples of what I have so far but I've not even been able to create anything remotely useful to use.

Upvotes: 5

Views: 578

Answers (2)

cmdv
cmdv

Reputation: 1753

I ended up going with this solution by Chris Penner via FP slack channel, it uses Union Find Algorithm (I've added comments to code to help a little):

-- | Take Map of land coordinates and return list of grouped land items forming islands
-- | Using Union find algorythm
findIslands ::  M.Map Coordinate Coordinate -> IO [[Coordinate]]
findIslands land = do
  -- create fresh point map
  pointMap <- traverse U.fresh land
  -- traverse each point checking for neighbours
  void . flip M.traverseWithKey pointMap $ \(x, y) point ->
      for_ (catMaybes (flip M.lookup pointMap <$> [(x + 1, y), (x, y + 1),(x +1, y +1), (x - 1, y + 1)]))
          $ \neighbourPoint ->
              U.union point neighbourPoint
  -- traverse ppintMap and representative and their descriptors
  withUnionKey :: (M.Map Coordinate Coordinate) <- for pointMap (U.repr >=> U.descriptor)
  -- swap cordinates arround
  let unionKeyToCoord :: [(Coordinate, Coordinate)] = (swap <$> M.toList withUnionKey)
      -- combine coordinates to create islands
      results :: M.Map Coordinate [Coordinate] = M.fromListWith (<>) (fmap (:[]) <$> unionKeyToCoord)
  -- return just the elements from the Map
  return (M.elems results)

convertTolandGrid :: [Coordinate] -> M.Map Coordinate Coordinate
convertTolandGrid = M.fromList . fmap (id &&& id)


Upvotes: 1

Daniel Wagner
Daniel Wagner

Reputation: 152707

I recommend using a union-find data structure. Loop over all positions; if it is land, mark it equivalent to any positions immediately NE, N, NW, or W of it that are also land. (It will automatically get marked equivalent to any land that exists E, SW, S, or SE of it when you visit that other land. The critical property of the set D={NE, N, NW, W} is that if you mirror all the directions in D to get M, then M∪D contains every direction; any other set D with this property will do fine, too.) The equivalence classes returned by the data structure at the end of this process will be your connected land chunks.

If n is the total number of positions, this process is O(n*log n); the log n component comes from the Map lookups needed to determine if a neighbor is land or water.

You should consider making the Map sparse if you can -- storing only the key-value pairs corresponding to land and skipping the water keys -- to graduate to O(m*log m) where m is the total number of lands, rather than the total number of positions. If you cannot (because you must remember the difference between water and not-existing positions, say), you could consider switching to an array as your backing store to graduate to O(n*a n), where a is the inverse Ackermann function, and so the whole shebang would basically be as close to O(n) as it is possible to get without actually being O(n).

Whether O(m*log m) or O(n*a n) is preferable when both are an option is a matter for empirical exploration on some data sets that you believe represent your typical use case.

Upvotes: 3

Related Questions