akucheck
akucheck

Reputation: 195

In F#, How do I use Seq.unfold in the context of a larger pipeline?

I have a CSV file with two columns, text and count. The goal is to transform the file from this:

some text once,1
some text twice,2
some text thrice,3

To this:

some text once,1
some text twice,1
some text twice,1
some text thrice,1
some text thrice,1
some text thrice,1

repeating each line count times and spreading the count over that many lines.

This seems to me like a good candidate for Seq.unfold, generating the additional lines, as we read the file. I have the following generator function:

let expandRows (text:string, number:int32) =
    if number = 0 
    then None
    else
        let element = text                  // "element" will be in the generated sequence
        let nextState = (element, number-1) // threaded state replacing looping 
        Some (element, nextState)

FSI yields a the following function signature:

val expandRows : text:string * number:int32 -> (string * (string * int32)) option

Executing the following in FSI:

let expandedRows = Seq.unfold expandRows ("some text thrice", 3)

yields the expected:

val it : seq<string> = seq ["some text thrice"; "some text thrice"; "some text thrice"]

The question is: how do I plug this into the context of a larger ETL pipeline? For example:

File.ReadLines(inFile)                  
    |> Seq.map createTupleWithCount
    |> Seq.unfold expandRows // type mismatch here
    |> Seq.iter outFile.WriteLine

The error below is on expandRows in the context of the pipeline.

Type mismatch. 
Expecting a 'seq<string * int32> -> ('a * seq<string * int32>) option'    
but given a     'string * int32 -> (string * (string * int32)) option' 
The type    'seq<string * int 32>' does not match the type 'string * int32'

I was expecting that expandRows was returning seq of string, as in my isolated test. As that is neither the "Expecting" or the "given", I'm confused. Can someone point me in the right direction?

A gist for the code is here: https://gist.github.com/akucheck/e0ff316e516063e6db224ab116501498

Upvotes: 7

Views: 609

Answers (3)

Mark Seemann
Mark Seemann

Reputation: 233135

In this case, since you simply want to repeat a value a number of times, there's no reason to use Seq.unfold. You can use Seq.replicate instead:

// 'a * int -> seq<'a>
let expandRows (text, number) = Seq.replicate number text

You can use Seq.collect to compose it:

File.ReadLines(inFile)
|> Seq.map createTupleWithCount
|> Seq.collect expandRows
|> Seq.iter outFile.WriteLine

In fact, the only work performed by this version of expandRows is to 'unpack' a tuple and compose its values into curried form.

While F# doesn't come with such a generic function in its core library, you can easily define it (and other similarly useful functions):

module Tuple2 =
    let curry f x y = f (x, y)    
    let uncurry f (x, y) = f x y    
    let swap (x, y) = (y, x)

This would enable you to compose your pipeline from well-known functional building blocks:

File.ReadLines(inFile)
|> Seq.map createTupleWithCount
|> Seq.collect (Tuple2.swap >> Tuple2.uncurry Seq.replicate)
|> Seq.iter outFile.WriteLine

Upvotes: 6

Fyodor Soikin
Fyodor Soikin

Reputation: 80734

Seq.map produces a sequence, but Seq.unfold does not take a sequence, it takes a single value. So you can't directly pipe the output of Seq.map into Seq.unfold. You need to do it element by element instead.

But then, for each element your Seq.unfold will produce a sequence, so the ultimate result will be a sequence of sequences. You can collect all those "subsequences" in a single sequence with Seq.collect:

File.ReadLines(inFile) 
    |> Seq.map createTupleWithCount 
    |> Seq.collect (Seq.unfold expandRows)
    |> Seq.iter outFile.WriteLine

Seq.collect takes a function and an input sequence. For every element of the input sequence, the function is supposed to produce another sequence, and Seq.collect will concatenate all those sequences in one. You may think of Seq.collect as Seq.map and Seq.concat combined in one function. Also, if you're coming from C#, Seq.collect is called SelectMany over there.

Upvotes: 6

Ringil
Ringil

Reputation: 6527

Sounds like what you want to do is actually

File.ReadLines(inFile)                  
|> Seq.map createTupleWithCount
|> Seq.map (Seq.unfold expandRows) // Map each tuple to a seq<string>
|> Seq.concat // Flatten the seq<seq<string>> to seq<string>
|> Seq.iter outFile.WriteLine

as it seems that you want to convert each tuple with count in your sequence into a seq<string> via Seq.unfold and expandRows. This is done by mapping.

Afterwards, you want to flatten your seq<seq<string>> into a large seq<string>, which is down via Seq.concat.

Upvotes: 2

Related Questions