Reputation: 293
I'm interested in writing an efficient Haskell function triangularize :: [a] -> [[a]]
that takes a (perhaps infinite) list and "triangularizes" it into a list of lists. For example, triangularize [1..19]
should return
[[1, 3, 6, 10, 15]
,[2, 5, 9, 14]
,[4, 8, 13, 19]
,[7, 12, 18]
,[11, 17]
,[16]]
By efficient, I mean that I want it to run in O(n)
time where n
is the length of the list.
Note that this is quite easy to do in a language like Python, because appending to the end of a list (array) is a constant time operation. A very imperative Python function which accomplishes this is:
def triangularize(elements):
row_index = 0
column_index = 0
diagonal_array = []
for a in elements:
if row_index == len(diagonal_array):
diagonal_array.append([a])
else:
diagonal_array[row_index].append(a)
if row_index == 0:
(row_index, column_index) = (column_index + 1, 0)
else:
row_index -= 1
column_index += 1
return diagonal_array
This came up because I have been using Haskell to write some "tabl" sequences in the On-Line Encyclopedia of Integer Sequences (OEIS), and I want to be able to transform an ordinary (1-dimensional) sequence into a (2-dimensional) sequence of sequences in exactly this way.
Perhaps there's some clever (or not-so-clever) way to foldr
over the input list, but I haven't been able to sort it out.
Upvotes: 13
Views: 327
Reputation: 4733
This appears to be directly related to the set theory argument proving that the set of integer pairs are in one-to-one correspondence with the set of integers (denumerable). The argument involves a so-called Cantor pairing function.
So, out of curiosity, let's see if we can get a diagonalize
function that way.
Define the infinite list of Cantor pairs recursively in Haskell:
auxCantorPairList :: (Integer, Integer) -> [(Integer, Integer)]
auxCantorPairList (x,y) =
let nextPair = if (x > 0) then (x-1,y+1) else (x+y+1, 0)
in (x,y) : auxCantorPairList nextPair
cantorPairList :: [(Integer, Integer)]
cantorPairList = auxCantorPairList (0,0)
And try that inside ghci:
λ> take 15 cantorPairList
[(0,0),(1,0),(0,1),(2,0),(1,1),(0,2),(3,0),(2,1),(1,2),(0,3),(4,0),(3,1),(2,2),(1,3),(0,4)]
λ>
We can number the pairs, and for example extract the numbers for those pairs which have a zero x coordinate:
λ>
λ> xs = [1..]
λ> take 5 $ map fst $ filter (\(n,(x,y)) -> (x==0)) $ zip xs cantorPairList
[1,3,6,10,15]
λ>
We recognize this is the top row from the OP's result in the text of the question. Similarly for the next two rows:
λ>
λ> makeRow xs row = map fst $ filter (\(n,(x,y)) -> (x==row)) $ zip xs cantorPairList
λ> take 5 $ makeRow xs 1
[2,5,9,14,20]
λ>
λ> take 5 $ makeRow xs 2
[4,8,13,19,26]
λ>
From there, we can write our first draft of a diagonalize
function:
λ>
λ> printAsLines xs = mapM_ (putStrLn . show) xs
λ> diagonalize xs = takeWhile (not . null) $ map (makeRow xs) [0..]
λ>
λ> printAsLines $ diagonalize [1..19]
[1,3,6,10,15]
[2,5,9,14]
[4,8,13,19]
[7,12,18]
[11,17]
[16]
λ>
For a list of 1 million items, the runtime is 18 sec, and 145 seconds for 4 millions items. As mentioned by Redu, this seems like O(n√n) complexity.
Distributing the pairs among the various target sublists is inefficient, as most filter operations fail.
To improve performance, we can use a Data.Map structure for the target sublists.
{-# LANGUAGE ExplicitForAll #-}
{-# LANGUAGE ScopedTypeVariables #-}
import qualified Data.List as L
import qualified Data.Map as M
type MIL a = M.Map Integer [a]
buildCantorMap :: forall a. [a] -> MIL a
buildCantorMap xs =
let ts = zip xs cantorPairList -- triplets (a,(x,y))
m0 = (M.fromList [])::MIL a
redOp m (n,(x,y)) = let afn as = case as of
Nothing -> Just [n]
Just jas -> Just (n:jas)
in M.alter afn x m
m1r = L.foldl' redOp m0 ts
in
fmap reverse m1r
diagonalize :: [a] -> [[a]]
diagonalize xs = let cm = buildCantorMap xs
in map snd $ M.toAscList cm
With that second version, performance appears to be much better: 568 msec for the 1 million items list, 2669 msec for the 4 millions item list. So it is close to the O(n*Log(n)) complexity we could have hoped for.
Upvotes: 6
Reputation: 26161
It might be a good idea to craete a comb
filter.
So what does comb
filter do..? It's like splitAt
but instead of splitting at a single index it sort of zips the given infinite list with the given comb to separate the items coressponding to True
and False
in the comb. Such that;
comb :: [Bool] -- yields [True,False,True,False,False,True,False,False,False,True...]
comb = iterate (False:) [True] >>= id
combWith :: [Bool] -> [a] -> ([a],[a])
combWith _ [] = ([],[])
combWith (c:cs) (x:xs) = let (f,s) = combWith cs xs
in if c then (x:f,s) else (f,x:s)
λ> combWith comb [1..19]
([1,3,6,10,15],[2,4,5,7,8,9,11,12,13,14,16,17,18,19])
Now all we need to do is to comb our infinite list and take the fst
as the first row and carry on combing the snd
with the same comb
.
Lets do it;
diags :: [a] -> [[a]]
diags [] = []
diags xs = let (h,t) = combWith comb xs
in h : diags t
λ> diags [1..19]
[ [1,3,6,10,15]
, [2,5,9,14]
, [4,8,13,19]
, [7,12,18]
, [11,17]
, [16]
]
also seems to be lazy too :)
λ> take 5 . map (take 5) $ diags [1..]
[ [1,3,6,10,15]
, [2,5,9,14,20]
, [4,8,13,19,26]
, [7,12,18,25,33]
, [11,17,24,32,41]
]
I think the complexity could be like O(n√n) but i can not make sure. Any ideas..?
Upvotes: 3
Reputation: 152837
Make increasing size chunks:
chunks :: [a] -> [[a]]
chunks = go 0 where
go n [] = []
go n as = b : go (n+1) e where (b,e) = splitAt n as
Then just transpose twice:
diagonalize :: [a] -> [[a]]
diagonalize = transpose . transpose . chunks
Try it in ghci:
> diagonalize [1..19]
[[1,3,6,10,15],[2,5,9,14],[4,8,13,19],[7,12,18],[11,17],[16]]
Upvotes: 13