Brendan Langfield
Brendan Langfield

Reputation: 585

Efficient connected components from a 2D array

In an m x n matrix consisting of 0s and 1s, we are tasked with counting the 4-directionally connected islands of 1s. A python implementation is as follows:

    def numIslands(grid):
        def sink(i, j):
            if 0 <= i < len(grid) and 0 <= j < len(grid[i]) and grid[i][j] == '1':
                grid[i][j] = '0'
                list(map(sink, (i+1, i-1, i, i), (j, j, j+1, j-1)))
                return 1
            return 0
        return sum(sink(i, j) for i in range(len(grid)) for j in range(len(grid[i])))

I have tried and failed to solve this problem in Haskell under the following constraints:

Things I've tried:

The issue in all cases is the plumbing code to convert the matrix into any nice graph representation where we can appeal to a library function is always too long.

Here you can see a working implementation using Algebra.Graph.AdjacencyMap that does a lot of redundant work:

data Coord = C Int Int deriving (Eq, Ord, Show)
data Cell a = Cell Coord a deriving (Eq, Ord)
type A = Cell Char

coord :: Cell a -> Coord
coord (Cell c _) = c

mkcell :: Int -> Int -> a -> Cell a
mkcell i j = Cell (C i j)

cells :: [[a]] -> [[Cell a]]
cells = zipWith (map . uncurry . mkcell) [0..] . map (zip [0..])

is1 :: A -> Bool
is1 (Cell _ '1') = True
is1 (Cell _  _ ) = False

cardinals :: Coord -> [Coord]
cardinals c = map ($ c) [up, down, left, right]
  where
    up    (C i j) = C (i+1) j
    down  (C i j) = C (i-1) j
    left  (C i j) = C i (j-1)
    right (C i j) = C i (j+1)

coordMap :: [[a]] -> Map Coord [a]
coordMap board = M.fromAscList [(C i j, [z]) | (i,zs) <- zip [0..] board, (j,z) <- zip [0..] zs]

mkstars :: [[A]] -> [(A, [A])]
mkstars xss = map (id &&& nbs) $ ones =<< xss
  where
    ones  = filter is1
    value = flip (M.findWithDefault []) (coordMap xss)
    nbs   = ones . value <=< cardinals . coord

islands :: [[Char]] -> Int
islands = length . dfsForest . stars . mkstars . cells

And some simple test cases:

main :: IO ()
main = do
  hspecWith defaultConfig {configFailFast = True} $ do
    describe "islands" $ do

      it "handles test case 0" $ do
        islands [ ['1', '1', '1', '1', '0']
                , ['1', '1', '0', '1', '0']
                , ['1', '1', '0', '0', '0']
                , ['0', '0', '0', '0', '0'] ] `shouldBe` 1

      it "handles test case 1" $ do
        islands [ ['1', '1', '0', '0', '0']
                , ['1', '1', '0', '0', '0']
                , ['0', '0', '1', '0', '0']
                , ['0', '0', '0', '1', '1'] ] `shouldBe` 3

      it "handles test case 2" $ do
        islands [['1']] `shouldBe` 1

Notice that we perform a pair of zip [0..]-like operations in two places. I think this ought to be unnecessary. This clocks in at 220 tokens according to vim, and that doesn't even include the dfsForest logic or the stuff from Data.Map.Strict! It feels ridiculous that programming in Haskell should be so verbose and complex, so I'm hoping it's just a skill issue.

Can anyone point me in the right direction? I'd also be happy with an answer of the form, "this is not possible but [here] is the best we can do."

If it helps, ultimately, my goal is to come up with a standard method for very concisely marshalling 2D array problems into a format compatible with the graph algorithms implemented in either containers, algebraic-graphs, or fgl. Although right now the stuff in algebraic-graphs is looking the nicest to me.

Upvotes: 0

Views: 103

Answers (2)

Brendan Langfield
Brendan Langfield

Reputation: 585

Adding a slightly simplified version of my original attempt for posterity.

import Control.Arrow hiding (loop, left, right)

import Data.Maybe
import Data.Map.Strict (Map, (!?))

import Algebra.Graph.AdjacencyMap hiding (transpose, tree)
import Algebra.Graph.AdjacencyMap.Algorithm

import qualified Data.Map.Strict as M

type Coord = (Int, Int)
data Cell a = Cell { coord :: Coord, val :: a } deriving (Eq, Ord, Show)

cells :: [[a]] -> [Cell a]
cells xss = [Cell (i,j) x | (i, xs) <- zip [0..] xss, (j, x) <- zip [0..] xs]

mkstars :: (a -> Bool) -> [Cell a] -> [(Cell a, [Cell a])]
mkstars p cs = [(c, filter nice (nbs c)) | c <- cs, nice c]
  where
    nbs  = mapMaybe (vals !?) . cardinals . coord
    vals = M.fromAscList [(coord c, c) | c <- cs]
    nice = p . val
    cardinals (i,j) = [(i,j+1),(i,j-1),(i+1,j),(i-1,j)]

islands :: [[Char]] -> Int
islands = length . dfsForest . stars . mkstars (== '1') . cells

This is a fair bit shorter at ~137 tokens with type signatures, ~108 without. Note that you can see the bones of a general approach to the flavor of matrix search problems that involve some kind of aggregation (rather than returning the entire matrix modified).

We have some predicate p which denotes which entries of the matrix are actually part of our graph, and the nbs function could be factored out into something that modifies state in the Cell type, so that we could do things like store a running path or the tail of a word we're search for.

However, it may be the case that it can be expressed even more simply using the Data.Vector API, since it gives us nice things like imap and ifoldl'. (Thanks Daniel Wagner for introducing these to me!)

Upvotes: 1

Daniel Wagner
Daniel Wagner

Reputation: 153172

Here's a fairly concise implementation:

{-# Language BlockArguments #-}

import Data.Partition
import Data.Vector (Vector)
import qualified Data.Set as S
import qualified Data.Vector as V

islands :: Vector (Vector Bool) -> Int
islands = length . nontrivialSets . fromSets . V.toList . V.concat . V.toList . \grid ->
    flip V.imap grid \i row ->
    flip V.imap row \j val -> S.fromList
    [ (i',j',lmao)
    | val
    , (i',j') <- [(i,j), (i-1,j), (i+1,j), (i,j-1), (i,j+1)]
    , or (grid V.!? i' >>= (V.!? j'))
    , lmao <- "01"
    ]

This uses data-partition and incurs a logarithmic overhead, as is common when translating mutable algorithms to immutable analogs naively, so runtime is O(mn log(mn)).* The lmao variable doubles the size of each island; nontrivialSets only reports sets with size strictly greater than one, and we want to report on islands with just one node in them.

See it run in ghci:

> islands . fmap (fmap ('1'==)) . read $ "[\"11000\", \"11000\", \"00100\", \"00011\"]"
3

A similar technique can be done with containers if you prefer.

{-# Language BlockArguments #-}
import Data.Graph
import Data.Vector (Vector)
import qualified Data.Vector as V

-- I hate myself for this name. I also love myself for this name
ifordl' :: Vector a -> b -> (b -> Int -> a -> b) -> b
ifordl' v z f = V.ifoldl' f z v

islands :: Vector (Vector Bool) -> Int
islands = length . scc . (\(g, _, _) -> g) . graphFromEdges . \grid ->
    ifordl' grid [] \nodes i row ->
    ifordl' row nodes \nodes' j val ->
    [((), (i, j), filter (\(i',j') -> or (grid V.!? i >>= (V.!? j)))
                         [(i+1,j), (i-1,j), (i,j+1), (i,j-1)])
    |val] ++ nodes'

As with the previous approach, this is O(mn log(mn)).


* It is well-known that any mutable algorithm can be made pure with an O(log n) multiplicative penalty. I believe it is an open question whether there are any problems which provably require that penalty, so it's likely that a cleverer theoretician than I could improve from here!

Upvotes: 3

Related Questions