Reputation: 2865
I am reading a large XML file using XmlReader and am exploring potential performance improvements via Async & pipelining. The following initial foray into the world of Async is showing that the Async version (which for all intents and purposes at this point is the equivalent of the Synchronous version) is much slower. Why would this be? All I've done is wrapped the "normal" code in an Async block and called it with Async.RunSynchronously
open System
open System.IO.Compression // support assembly required + FileSystem
open System.Xml // support assembly required
let readerNormal (reader:XmlReader) =
let temp = ResizeArray<string>()
while reader.Read() do
()
temp
let readerAsync1 (reader:XmlReader) =
async{
let temp = ResizeArray<string>()
while reader.Read() do
()
return temp
}
let readerAsync2 (reader:XmlReader) =
async{
while reader.Read() do
()
}
[<EntryPoint>]
let main argv =
let path = @"C:\Temp\LargeTest1000.xlsx"
use zipArchive = ZipFile.OpenRead path
let sheetZipEntry = zipArchive.GetEntry(@"xl/worksheets/sheet1.xml")
let stopwatch = System.Diagnostics.Stopwatch()
stopwatch.Start()
let sheetStream = sheetZipEntry.Open() // again
use reader = XmlReader.Create(sheetStream)
let temp1 = readerNormal reader
stopwatch.Stop()
printfn "%A" stopwatch.Elapsed
System.GC.Collect()
let stopwatch = System.Diagnostics.Stopwatch()
stopwatch.Start()
let sheetStream = sheetZipEntry.Open() // again
use reader = XmlReader.Create(sheetStream)
let temp1 = readerAsync1 reader |> Async.RunSynchronously
stopwatch.Stop()
printfn "%A" stopwatch.Elapsed
System.GC.Collect()
let stopwatch = System.Diagnostics.Stopwatch()
stopwatch.Start()
let sheetStream = sheetZipEntry.Open() // again
use reader = XmlReader.Create(sheetStream)
readerAsync2 reader |> Async.RunSynchronously
stopwatch.Stop()
printfn "%A" stopwatch.Elapsed
printfn "DONE"
System.Console.ReadLine() |> ignore
0 // return an integer exit code
I am aware that the above Async code does not do any actual Async work - what I a trying to ascertain here is the overhead of simply making it Async
I don't expect it to go faster just because I've wrapped it in an Async. My question is the opposite: why the dramatic (IMHO) slowdown.
A comment below correctly pointed out that I should provide timings for datasets of various sizes which is implicitly what had led me to be asking this question in the first instance.
The following are some times based on small vs large datasets. While the absolute values are not too meaningful, the relativities are interesting:
30 elements (small dataset)
Normal: 00:00:00.0006994
Async1: 00:00:00.0036529
Async2: 00:00:00.0014863
(A lot slower but presumably indicative of Async setup costs - this is as expected)
1.5 million elements
Normal: 00:00:01.5749734
Async1: 00:00:03.3942754
Async2: 00:00:03.3760785
(~ 2x slower. Surprised that the difference in timing is not amortized as the dataset gets bigger. If this is the case, then pipelining/parallelization can only improve performance here if you have more than two cores - to outweigh the overhead that I can't explain...)
Upvotes: 1
Views: 590
Reputation: 63772
There's no asynchronous work to do. In effect, all you get is the overheads and no benefits. async {}
doesn't mean "everything in the braces suddenly becomes asynchronous". It simply means you have a simplified way of using asynchronous code - but you never call a single asynchronous function!
Additionaly, "asynchronous" doesn't necessarily mean "parallel", and it doesn't necessarily involve multiple threads. For example, when you do an asynchronous request to read a file (which you're not doing here), it means that the OS is told what you want to be done, and how you should be notified when it is done. When you run code like this using RunSynchronously
, you're simply blocking one thread while posting asynchronous file requests - a scenario pretty much identical to using synchronous file requests in the first place.
The moment you do RunSynchronously
, you throw away any reason whatsoever to use asynchronous code in the first place. You're still using a single thread, you just blocked another thread at the same time - instead of saving on threads, you waste one, and add another to do the real work.
EDIT:
Okay, I've investigated with the minimal example, and I've got some observations.
async
code doesn't actually do anything asynchronous, it creates a ton of async builder helpers, sprinkles a ton of (asynchronous) Delay
calls through the code, and going into outright absurd territory, each iteration of the loop is an extra method call, including the setup of a helper object.Apparently, F# automatically translates while
into an asynchronous while
. Now, given how well compressed xslt
data usually is, very little I/O is involved in those Read
operations, so the overhead absolutely dominates - and since every iteration of the "loop" has its own setup cost, the overhead scales with the amount of data.
While this is mostly caused by the while
not actually doing anything, it also obviously means that you need to be careful about what you select as async
, and you need to avoid using it in a case where CPU time dominates (as in this case - after all, both the async and non-async case are almost 100% CPU tasks in practice). This is further worsened by the fact that Read
reads a single node at a time - something that's relatively trivial even in a big, non-compressed xml file. The overheads absolutely dominate. In effect, this is analogous to using Parallel.For
with a body like sum += i
- the setup cost of each of the each of the iterations absolutely dwarfs any actual work being done.
The CPU profiling makes this rather obvious - the two most work intensive methods are:
XmlReader.Read
(expected)Thread::intermediateThreadProc
- also known as "this code runs on a thread pool thread". The overhead from this in a no-op code like this is around 100% - yikes. Apparently, even though there is no real asynchronicity anywhere, the callbacks are never run synchronously. Every iteration of the loop posts work to a new thread pool thread.The lesson learned? Probably something like "don't use loops in async
if the loop body does very little work". The overhead is incurred for each and every iteration of the loop. Ouch.
Upvotes: 7
Reputation: 233197
Asynchronous code doesn't magically make your code faster. As you've discovered, it'll tend to make isolated code slower, because there's overhead involved with managing the asynchrony.
What it can do is to be more efficient, but that's not the same as being inherently faster. The main purpose of Async
is to make Input/Output code more efficient.
If you invoke a 'slow', blocking I/O operation directly, you'll block the thread until the operation returns.
If you instead invoke that slow operation asynchronously, it may free up the thread to do other things. It does require that there's an underlying implementation that's not thread-bound, but uses another mechanism for receiving the response. I/O Completion Ports could be such a mechanism.
Now, if you run a lot of asynchronous code in parallel, it may turn out to be faster than attempting to run the blocking implementation in parallel, because the async versions use fewer resources (fewer threads = less memory).
Upvotes: 2