Molochdaa
Molochdaa

Reputation: 2218

In F#, how to get head/tail of a seq without re-evaluating the seq

I'm reading a file and I want to do something with the first line, and something else with all the other lines

let lines = System.IO.File.ReadLines "filename.txt" |> Seq.map (fun r -> r.Trim())

let head = Seq.head lines
let tail = Seq.tail lines

```

Problem: the call to tail fails because the TextReader is closed. What it means is that the Seq is evaluated twice: once to get the head once to get the tail.

How can I get the firstLine and the lastLines, while keeping a Seq and without reevaluating the Seq ?

the signature could be, for example :

let fn: ('a -> Seq<'a> -> b) -> Seq<'a> -> b

Upvotes: 6

Views: 709

Answers (3)

Stephen Hosking
Stephen Hosking

Reputation: 1465

I had an important use case for this, where I am using Seq.unfold to read a large number of blocks with REST reads, and sequentially processing each block, with further REST reads.

The reading of the sequence had to be both "lazy" but also cached to avoid duplicate re-evaluation (with every Seq.tail operation).

Hence finding this question and the accepted answer (Seq.cache). Thanks!

I experimented with Seq.cache and discovered that it worked as claimed (ie, lazy and avoid re-evaluation), but with one noteworthy condition - the first five elements of the sequence are always read first (and retained with 'cache'), so experiments on five or smaller numbers won't show lazy evaluation. However, after five, lazy evaluation kicks in for each element.

This code can be used to experiment. Try it for 5, and see no lazy evaluation, and then 10, and see each element after 5 being 'lazy' read, as required. Also remove Seq.cache to see the problem we are addressing (re-evaluation)

// Get a Sequence of numbers.
let getNums n  = seq { for i in 1..n do printfn "Yield { %d }" i; yield i}

// Unfold a sequence of numbers
let unfoldNums (nums : int seq) =
    nums
    |> Seq.unfold
        (fun (nums : int seq) ->
            printfn "unfold: nums = { %A }" nums
            if Seq.isEmpty nums then
                printfn "Done"
                None
            else
                let num = Seq.head nums // Value to yield
                let tl = Seq.tail nums // Next State. CAUSES RE-EVALUTION!
                printfn "Yield: < %d >, tl =  { %A }" num tl
                Some (num,tl))
    
// Get n numbers as a sequence, then unfold them as a sequence
// Observe that with 'Seq.cache' input is not re-evaluated unnecessarily, 
// and also that lazy evaulation kicks in for n > 5
let experiment n =
    getNums n
    |> Seq.cache
    // Without cache, Seq.tail causes the sequence to be re-evaluated
    |> unfoldNums
    |> Seq.iter (fun x -> printfn "Process: %d" x)

Upvotes: 0

Aaron M. Eshbach
Aaron M. Eshbach

Reputation: 6510

I generally use a seq expression in which the Stream is scoped inside the expression. That will allow you to enumerate the sequence fully before the stream is disposed. I usually use a function like this:

let readLines file =
    seq {
        use stream = File.OpenText file
        while not stream.EndOfStream do
            yield stream.ReadLine().Trim()
    }

Then you should be able to call Seq.head and get the first line in the fail, and Seq.last to get the last line in the file. I think this will technically create two different enumerators though. If you want to only read the file exactly one time, then materializing the sequence to a list or using a function like Seq.cache will be your best option.

Upvotes: 6

Wesley Wiser
Wesley Wiser

Reputation: 9851

The easiest thing to do is probably just using Seq.cache to wrap your lines sequence:

let lines =
  System.IO.File.ReadLines "filename.txt"
  |> Seq.map (fun r -> r.Trim())
  |> Seq.cache

Of note from the documentation:

This result sequence will have the same elements as the input sequence. The result can be enumerated multiple times. The input sequence is enumerated at most once and only as far as is necessary. Caching a sequence is typically useful when repeatedly evaluating items in the original sequence is computationally expensive or if iterating the sequence causes side-effects that the user does not want to be repeated multiple times.

Upvotes: 10

Related Questions