Jamie Dixon
Jamie Dixon

Reputation: 4302

FSharp: Using CSV Type Provider Async

I am using the csv type provider to collect some data from a series of files I have on Azure blob storage:

#r "../packages/FSharp.Data.2.0.9/lib/portable-net40+sl5+wp8+win8/FSharp.Data.dll"
open FSharp.Data

type censusDataContext = CsvProvider<"https://portalvhdspgzl51prtcpfj.blob.core.windows.net/censuschicken/AK.TXT">
type stateCodeContext = CsvProvider<"https://portalvhdspgzl51prtcpfj.blob.core.windows.net/censuschicken/states.csv">

let stateCodes =  stateCodeContext.Load("https://portalvhdspgzl51prtcpfj.blob.core.windows.net/censuschicken/states.csv");

let fetchStateData (stateCode:string)=
        let uri = System.String.Format("https://portalvhdspgzl51prtcpfj.blob.core.windows.net/censuschicken/{0}.TXT",stateCode)
        censusDataContext.Load(uri).Rows

let usaData = stateCodes.Rows 
                |> Seq.collect(fun r -> fetchStateData(r.Abbreviation))
                |> Seq.length

I now want to run these async and I am running into a problem with AsyncLoad:

let fetchStateDataAsync(stateCode:string)=
    async{
        let uri = System.String.Format("https://portalvhdspgzl51prtcpfj.blob.core.windows.net/censuschicken/{0}.TXT",stateCode)
        let! stateData =  censusDataContext.AsyncLoad(uri)
        return stateData.Rows
    }

let usaData = stateCodes.Rows 
                |> Seq.collect(fun r -> fetchStateDataAsync(r.Abbreviation))
                |> Seq.length

The error message is

The type 'Async<seq<CsvProvider<...>.Row>>' is not compatible with the type 'seq<'a>'

Forgive my lack of async knowledge, but do I have to use something other than Seq.Collect when applying async functions?

Thanks in advance

Upvotes: 2

Views: 358

Answers (1)

Tomas Petricek
Tomas Petricek

Reputation: 243096

The problem is that turning code to asynchronous (by wrapping it in the async { .. } block) changes the result from seq<Row> to Async<seq<Row>> - that is, you now get an asynchronous computation that will eventually complete and return the sequence.

To fix this, you need to somehow start the computation and wait for the result. There is a number of choices - like running one by one sequentially. Probably the easiest option (and maybe the best - depending on what you want to do) is to run the computations in parallel:

let getAll = 
  stateCodes.Rows 
  |> Seq.map(fun r -> fetchStateDataAsync(r.Abbreviation))
  |> Async.Parallel

This gives you an asynchronous computation that runs all the downloads and returns an array of results. You can run this synchronously (and block) and get the results:

getAll |> Async.RunSynchronously
       |> Seq.collect id
       |> Seq.length

If you want to run the downloads asynchronously in the background you can do that to, but you need to specify what to do with the result. For example:

async { 
  let! all = getAll
  all |> Seq.collect id |> Seq.length |> printfn "Length %d" }
|> Async.Start

Upvotes: 4

Related Questions