BitTickler
BitTickler

Reputation: 11927

http download to disk with fsharp.data.dll and async workflows stalls

The following .fsx file is supposed to download and save to disk binary table base files which are posted as links in a html page on the internet, using Fsharp.Data.dll.

What happens, is that the whole thing stalls after a while and way before it is done, not even throwing an exception or alike.

I am pretty sure, I kind of mis-handle the CopyToAsync() thingy in my async workflow. As this is supposed to run while I go for a nap, it would be nice if someone could tell me how it is supposed to be done correctly. (In more general terms - how to handle a System.Threading.Task thingy in an async workflow thingy?)

#r @"E:\R\playground\DataTypeProviderStuff\packages\FSharp.Data.2.2.3\lib\net40\FSharp.Data.dll"

open FSharp.Data
open Microsoft.FSharp.Control.CommonExtensions
let document = HtmlDocument.Load("http://www.olympuschess.com/egtb/gaviota/")
let links = 
    document.Descendants ["a"] |> Seq.choose (fun x -> x.TryGetAttribute("href") |> Option.map (fun a -> a.Value()))
    |> Seq.filter (fun v -> v.EndsWith(".cp4"))
    |> List.ofSeq

let targetFolder = @"E:\temp\tablebases\"
let downloadUrls = 
    links |> List.map (fun name -> "http://www.olympuschess.com/egtb/gaviota/" + name, targetFolder + name )

let awaitTask = Async.AwaitIAsyncResult >> Async.Ignore

let fetchAndSave (s,t) =
    async {
        printfn "Starting with %s..." s
        let! result = Http.AsyncRequestStream(s)
        use fileStream = new System.IO.FileStream(t,System.IO.FileMode.Create)
        do! awaitTask (result.ResponseStream.CopyToAsync(fileStream))
        printfn "Done with %s." s
    }

let makeBatches n jobs =
    let rec collect i jl acc =
        match i,jl with
        | 0, _ -> acc,jl
        | _, [] -> acc,jl
        | _, x::xs -> collect (i-1) (xs) (acc @ [x])
    let rec loop remaining acc =
        match remaining with
        | [] -> acc
        | x::xs ->
            let r,rest = collect n remaining []
            loop rest (acc @ [r])
    loop jobs []


let download () = 
    downloadUrls 
    |> List.map fetchAndSave
    |> makeBatches 2
    |> List.iter (fun l -> l |> Async.Parallel |> Async.RunSynchronously |> ignore )
    |> ignore

download()

Note Updated code so it creates batches of 2 downloads at a time and only the first batch works. Also added the awaitTask from the first answer as this seems the right way to do it.

News What is also funny: If I interrupt the stalled script and then #load it again into the same instance of fsi.exe, it stalls right away. I start to think it is a bug in the library I use or something like that.

Thanks, in advance!

Upvotes: 3

Views: 293

Answers (1)

Kevin
Kevin

Reputation: 2291

Here fetchAndSave has been modified to handle the Task returned from CopyToAsync asynchronously. In your version you are waiting on the Task synchronously. Your script will appear to lock up as you are using Async.RunSynchronously to run the whole workflow. However the files do download as expected in the background.

let awaitTask = Async.AwaitIAsyncResult >> Async.Ignore

let fetchAndSave (s,t) = async {
    let! result = Http.AsyncRequestStream(s)
    use fileStream = new System.IO.FileStream(t,System.IO.FileMode.Create)
    do! awaitTask (result.ResponseStream.CopyToAsync(fileStream))
}

Of course you also need to call

do download()

on the last line of your script to kick things into motion.

Upvotes: 2

Related Questions