Peter Kagey
Peter Kagey

Reputation: 293

Triangularizing a list in Haskell

I'm interested in writing an efficient Haskell function triangularize :: [a] -> [[a]] that takes a (perhaps infinite) list and "triangularizes" it into a list of lists. For example, triangularize [1..19] should return

[[1,  3,  6,  10, 15]
,[2,  5,  9,  14]
,[4,  8,  13, 19]
,[7,  12, 18]
,[11, 17]
,[16]]

By efficient, I mean that I want it to run in O(n) time where n is the length of the list.


Note that this is quite easy to do in a language like Python, because appending to the end of a list (array) is a constant time operation. A very imperative Python function which accomplishes this is:

def triangularize(elements):
    row_index = 0
    column_index = 0
    diagonal_array = []
    for a in elements:
        if row_index == len(diagonal_array):
            diagonal_array.append([a])
        else:
            diagonal_array[row_index].append(a)
        if row_index == 0:
            (row_index, column_index) = (column_index + 1, 0)
        else:
            row_index -= 1
            column_index += 1
    return diagonal_array

This came up because I have been using Haskell to write some "tabl" sequences in the On-Line Encyclopedia of Integer Sequences (OEIS), and I want to be able to transform an ordinary (1-dimensional) sequence into a (2-dimensional) sequence of sequences in exactly this way.

Perhaps there's some clever (or not-so-clever) way to foldr over the input list, but I haven't been able to sort it out.

Upvotes: 13

Views: 327

Answers (3)

jpmarinier
jpmarinier

Reputation: 4733

This appears to be directly related to the set theory argument proving that the set of integer pairs are in one-to-one correspondence with the set of integers (denumerable). The argument involves a so-called Cantor pairing function.

So, out of curiosity, let's see if we can get a diagonalize function that way. Define the infinite list of Cantor pairs recursively in Haskell:

auxCantorPairList :: (Integer, Integer) -> [(Integer, Integer)]
auxCantorPairList (x,y) =
    let nextPair = if (x > 0) then (x-1,y+1) else (x+y+1, 0)
    in (x,y) : auxCantorPairList nextPair

cantorPairList :: [(Integer, Integer)]
cantorPairList = auxCantorPairList (0,0)

And try that inside ghci:

 λ> take 15 cantorPairList
[(0,0),(1,0),(0,1),(2,0),(1,1),(0,2),(3,0),(2,1),(1,2),(0,3),(4,0),(3,1),(2,2),(1,3),(0,4)]
 λ> 

We can number the pairs, and for example extract the numbers for those pairs which have a zero x coordinate:

 λ> 
 λ> xs = [1..]
 λ> take 5 $ map fst $ filter (\(n,(x,y)) -> (x==0)) $ zip xs cantorPairList
[1,3,6,10,15]
 λ> 

We recognize this is the top row from the OP's result in the text of the question. Similarly for the next two rows:

 λ> 
 λ> makeRow xs row = map fst $ filter (\(n,(x,y)) -> (x==row)) $ zip xs cantorPairList
 λ> take 5 $ makeRow xs 1
[2,5,9,14,20]
 λ> 
 λ> take 5 $ makeRow xs 2
[4,8,13,19,26]
 λ> 

From there, we can write our first draft of a diagonalize function:

 λ> 
 λ> printAsLines xs = mapM_ (putStrLn . show) xs
 λ> diagonalize xs = takeWhile (not . null) $ map (makeRow xs) [0..]
 λ> 
 λ> printAsLines $ diagonalize [1..19]
[1,3,6,10,15]
[2,5,9,14]
[4,8,13,19]
[7,12,18]
[11,17]
[16]
 λ> 

EDIT: performance update

For a list of 1 million items, the runtime is 18 sec, and 145 seconds for 4 millions items. As mentioned by Redu, this seems like O(n√n) complexity.

Distributing the pairs among the various target sublists is inefficient, as most filter operations fail.

To improve performance, we can use a Data.Map structure for the target sublists.


{-#  LANGUAGE  ExplicitForAll       #-}
{-#  LANGUAGE  ScopedTypeVariables  #-}

import qualified  Data.List  as  L
import qualified  Data.Map   as  M

type MIL a = M.Map Integer [a]

buildCantorMap :: forall a.  [a] -> MIL a
buildCantorMap xs = 
    let   ts     =  zip xs cantorPairList -- triplets (a,(x,y))
          m0     = (M.fromList [])::MIL a
          redOp m (n,(x,y)) = let  afn as = case as of
                                              Nothing  -> Just [n]
                                              Just jas -> Just (n:jas)
                              in   M.alter afn x m
          m1r = L.foldl' redOp m0 ts
    in
          fmap reverse m1r

diagonalize :: [a] -> [[a]]
diagonalize xs = let  cm = buildCantorMap xs
                 in   map snd $ M.toAscList cm


With that second version, performance appears to be much better: 568 msec for the 1 million items list, 2669 msec for the 4 millions item list. So it is close to the O(n*Log(n)) complexity we could have hoped for.

Upvotes: 6

Redu
Redu

Reputation: 26161

It might be a good idea to craete a comb filter.

So what does comb filter do..? It's like splitAt but instead of splitting at a single index it sort of zips the given infinite list with the given comb to separate the items coressponding to True and False in the comb. Such that;

comb :: [Bool]  -- yields [True,False,True,False,False,True,False,False,False,True...]
comb = iterate (False:) [True] >>= id

combWith :: [Bool] -> [a] -> ([a],[a])
combWith _ []          = ([],[])
combWith (c:cs) (x:xs) = let (f,s) = combWith cs xs
                         in if c then (x:f,s) else (f,x:s)

λ> combWith comb [1..19]
([1,3,6,10,15],[2,4,5,7,8,9,11,12,13,14,16,17,18,19])

Now all we need to do is to comb our infinite list and take the fst as the first row and carry on combing the snd with the same comb.

Lets do it;

diags :: [a] -> [[a]]
diags [] = []
diags xs = let (h,t) = combWith comb xs
           in h : diags t

λ> diags [1..19]
[ [1,3,6,10,15]
, [2,5,9,14]
, [4,8,13,19]
, [7,12,18]
, [11,17]
, [16]
]

also seems to be lazy too :)

λ> take 5 . map (take 5) $ diags [1..]
[ [1,3,6,10,15]
, [2,5,9,14,20]
, [4,8,13,19,26]
, [7,12,18,25,33]
, [11,17,24,32,41]
]

I think the complexity could be like O(n√n) but i can not make sure. Any ideas..?

Upvotes: 3

Daniel Wagner
Daniel Wagner

Reputation: 152837

Make increasing size chunks:

chunks :: [a] -> [[a]]
chunks = go 0 where
    go n [] = []
    go n as = b : go (n+1) e where (b,e) = splitAt n as

Then just transpose twice:

diagonalize :: [a] -> [[a]]
diagonalize = transpose . transpose . chunks

Try it in ghci:

> diagonalize [1..19]
[[1,3,6,10,15],[2,5,9,14],[4,8,13,19],[7,12,18],[11,17],[16]]

Upvotes: 13

Related Questions