Fopa Léon Constantin
Fopa Léon Constantin

Reputation: 12363

How to make my Haskell code use Laziness and Garbage collector

I wrote a Haskell code which has to solve the following problem : we have n files : f1, f2, f3 .... fn and I cut those files such a way that each slice has 100 lines

  f1_1, f1_2, f1_3 .... f1_m

  f2_1, f2_2, .... f2_n
  ...

  fn_1, fn_2, .... fn_k

finally I construct a special data type (Dags) using slices in the following way

  f1_1, f2_1, f3_1, .... fn_1 => Dag1

  f1_2, f2_2, f3_2, ..... fn_2 => Dag2

  ....

  f1_k, f2_k, f3_k, ..... fn_k => Dagk

the code that I wrote start by cutting all the files, then it couple the i-th elements of the results list and construct Dag using the final result list

it looks like this

  -- # take a filename and cut the file in slices of 100 lines

  sliceFile :: FilePath -> [[String]]

  -- # take a list of lists and group the i-th elements into list

  coupleIthElement :: [[String]] -> [[String]]

  -- # take a list of lines and create a DAG

  makeDags :: [String] ->  Dag

  -- # final code look like this

  makeDag_ :: [FilePath] -> [Dag]

  makeDags files = map makeDags $ coupleIthElement (concat (map sliceFile files))

The problem is that this code is non-efficient because :

How could I re-write my program to take advantage of garbage collector work and Laziness of Haskell ?

if not possible or easier, what can i do to be more efficient even a bit ?

thanks for reply


edit

coupleIthElement ["abc", "123", "xyz"] must return ["a1x","b2y","c3z"]

of cause the 100 lines are arbitrary selected using a particular criteria upon some element of the lines but i discard this aspect to make the problem more easier to understand,

another edition

data Dag = Dag ([(Int, String)], [((Int, Int), Int)]) deriving Show

test_dag = Dag ([(1, "a"),(2, "b"),(3, "c")],[((1,2),1),((1,3),1)])

test_dag2 = Dag ([],[])

the first list is each vertice define by the number and the label, the second list is the edges ((1,2),3) means edge between vertice 1 and 2 with the cost 3

Upvotes: 2

Views: 440

Answers (2)

John L
John L

Reputation: 28097

A few points:

1) Have you considered using fgl? It's probably more efficient than your own Dag implementation. If you really need to use Dag, you could construct your graphs with fgl then convert them to Dag when they're complete.

2) It seems like you don't actually use the slices when constructing your graphs, rather they control how many graphs you have. If so, how about something like this:

dagFromHandles :: [Handle] -> IO Dag
dagFromHandles = fmap makeDags . mapM hGetLine

allDags :: [FilePath] -> IO [Dag]
allDags listOfFiles = do
  handles <- mapM (flip openFile ReadMode) listOfFiles
  replicateM 100 (dagFromHandles handles)

This assumes that each file has at least 100 lines, and any extra lines will be ignored. Even better would be if you had a function that would consume a Dag, then you could do

useDag :: Dag -> IO ()

runDags :: [FilePath] -> IO ()
runDags listOfFiles = do
  handles <- mapM (flip openFile ReadMode) listOfFiles
  replicateM_ 100 (dagFromHandles handles >>= useDag)

This should make more efficient use of garbage collection.

Of course this assumes that I understand the problem properly, and I'm not certain that I do. Note that concat (map sliceFile) should be a no-op (sliceFile would need to be in IO as you've defined the type, but ignoring that for now), so I don't see why you're bothering with it at all.

Upvotes: 2

fuz
fuz

Reputation: 92976

If it's not needed to process your file in slices, avoid this. Haskell does this automatically! In Haskell, you think of IO as a stream. Data is read from input, as soon as it's needed and discarded, as soon as it's unused. So for instance, this is an easy file-copying programm:

main = interact id

interact has the signature interact :: (String -> String) -> IO (), and feeds the input into a function which handles it and produces some output, which is written to stdout. This program is more efficient then most C-implementations, as the runtime automatically buffers the input and output.

If you want to understand laziness, you have to forget all the wisdom you learned as a imperative programmer and have to think about a program as a description to modify data, not as a set of instructions - data is only processed when needed!

The key point, why your data may be handled the wrong way is the multiple traversion of the list. Your function makeDags traverses the transposed the slices list one by one, so the elements of the original list may not be discarded. What you should try, is to write your function in a way like this:

sliceFile :: FilePath -> [[String]]
sliceFile fp = do
  f <- readFile fp
  let l = lines fp
      slice [] = []
      slice x  = ll : slice ls where (ll,ls) = splitAt 100 x
  return slice l

sliceFirstRow :: [[String]] -> ([String],[[String]])
sliceFirstRow list = unzip $ map (\(x:xs) -> (x,xs)) list

makeDags :: [[String]] -> [Dag]
makeDags [[]] = []
makeDags list = makeDag firstRow : makeDags restOfList where
 (firstRow,restOfList) = sliceFirstRow list

This function may be a solution, since the first row is no longer referenced, when it's done. But in the most places, this is a result of laziness, so you could probably try to use seq to force building the Dags and allowing the IO data to be garbage-collected. (If you don't force building the dags, the data can't be garbage collected).

But anyway, I could provide a more helpfull answer, if you give some informations about what these dags are.

Upvotes: 1

Related Questions