How to make my Haskell code use Laziness and Garbage collector

Question

I wrote a Haskell code which has to solve the following problem : we have n files : f1, f2, f3 .... fn and I cut those files such a way that each slice has 100 lines

  f1_1, f1_2, f1_3 .... f1_m

  f2_1, f2_2, .... f2_n
  ...

  fn_1, fn_2, .... fn_k

finally I construct a special data type (Dags) using slices in the following way

  f1_1, f2_1, f3_1, .... fn_1 => Dag1

  f1_2, f2_2, f3_2, ..... fn_2 => Dag2

  ....

  f1_k, f2_k, f3_k, ..... fn_k => Dagk

the code that I wrote start by cutting all the files, then it couple the i-th elements of the results list and construct Dag using the final result list

it looks like this

  -- # take a filename and cut the file in slices of 100 lines

  sliceFile :: FilePath -> [[String]]

  -- # take a list of lists and group the i-th elements into list

  coupleIthElement :: [[String]] -> [[String]]

  -- # take a list of lines and create a DAG

  makeDags :: [String] ->  Dag

  -- # final code look like this

  makeDag_ :: [FilePath] -> [Dag]

  makeDags files = map makeDags $ coupleIthElement (concat (map sliceFile files))

The problem is that this code is non-efficient because :

it needs storing all the files in memory in list form
the garbage collector is not working efficiently since all fonctions need the results list of the previous fonction

How could I re-write my program to take advantage of garbage collector work and Laziness of Haskell ?

if not possible or easier, what can i do to be more efficient even a bit ?

thanks for reply

edit

coupleIthElement ["abc", "123", "xyz"] must return ["a1x","b2y","c3z"]

of cause the 100 lines are arbitrary selected using a particular criteria upon some element of the lines but i discard this aspect to make the problem more easier to understand,

another edition

data Dag = Dag ([(Int, String)], [((Int, Int), Int)]) deriving Show

test_dag = Dag ([(1, "a"),(2, "b"),(3, "c")],[((1,2),1),((1,3),1)])

test_dag2 = Dag ([],[])

the first list is each vertice define by the number and the label, the second list is the edges ((1,2),3) means edge between vertice 1 and 2 with the cost 3

John L · Accepted Answer

A few points:

1) Have you considered using fgl? It's probably more efficient than your own Dag implementation. If you really need to use Dag, you could construct your graphs with fgl then convert them to Dag when they're complete.

2) It seems like you don't actually use the slices when constructing your graphs, rather they control how many graphs you have. If so, how about something like this:

dagFromHandles :: [Handle] -> IO Dag
dagFromHandles = fmap makeDags . mapM hGetLine

allDags :: [FilePath] -> IO [Dag]
allDags listOfFiles = do
  handles <- mapM (flip openFile ReadMode) listOfFiles
  replicateM 100 (dagFromHandles handles)

This assumes that each file has at least 100 lines, and any extra lines will be ignored. Even better would be if you had a function that would consume a Dag, then you could do

useDag :: Dag -> IO ()

runDags :: [FilePath] -> IO ()
runDags listOfFiles = do
  handles <- mapM (flip openFile ReadMode) listOfFiles
  replicateM_ 100 (dagFromHandles handles >>= useDag)

This should make more efficient use of garbage collection.

Of course this assumes that I understand the problem properly, and I'm not certain that I do. Note that concat (map sliceFile) should be a no-op (sliceFile would need to be in IO as you've defined the type, but ignoring that for now), so I don't see why you're bothering with it at all.

How to make my Haskell code use Laziness and Garbage collector

Answers (2)

Related Questions