Reputation: 857
TL;DR: Why is it that a single-threaded application runs this code in 80077ms, whereas multithreaded, it takes over 3 times as long?
Bit of a long-winded question and possibly could result in a much simpler resolution than what I've found.
I'm trying to separate one datatable into x
amount of datatables, based on a variable that is hard-coded. That's not where the issue arises, although if anyone has a cleaner solution than I have, I'd be very appreciative of help in that aspect.
My issue arises from the fact that, even though I generate x
amount of BackgroundWorker
, I am still getting results that are showing that it's not advantageous to break the main table into multiple tables.
The idea behind this is simple - we have an application that can run only a certain amount of concurrent connections, let's say 10
simultaneous connections. I want to be able to get the initial datatable of let's say 150,000
rows, and I know that for 10
connections I may as well make 10
datatables of 15,000
rows each, then process each table individually, rather than pushing through 150,000
with one datatable, all under one connection.
So far, this is what I've come up with
Private Sub CheckJobcodesPendingUpdate()
Jobcode_AlreadyTried = New List(Of Integer)
Dim sw = Stopwatch.StartNew()
RTB.AppendText("I'm starting..." & vbCrLf)
Dim Jobcodes As DataTable = SQL.SQLdataTable("SELECT [Jobcode] FROM [database].[schema].[Jobcodes]")
sw.Stop
RTB.AppendText("Took " & sw.ElapsedMilliseconds & "ms to retrieve " & Jobcodes.Rows.Count & " rows." & vbCrLf)
Application.DoEvents
sw = Stopwatch.StartNew()
Dim ds As New DataSet
Dim dt As Datatable
Dim tableSeperator as Integer = Jobcodes.Rows.Count / 10 'The amount of connections we can have simultaneously.
Dim tableCount As Integer = 0
tableCount = Math.Ceiling(JobcodesEPC10.Rows.Count / tableSeperator)
Do Until tableCount = 0
dt = (From t In Jobcodes Order By t.Item("Jobcode") Ascending Select t).Take(tableSeperator).CopyToDataTable
For each row As DataRow In dt.Rows
Jobcodes.Rows.Remove(Jobcodes.AsEnumerable.First(Function(r) r.Item("Jobcode") = row.Item("Jobcode")))
Next
ds.Tables.Add(dt)
tableCount -= 1
Loop
sw.Stop
RTB.AppendText(vbCrLf & "Took " & sw.ElapsedMilliseconds & "ms to create all " & ds.Tables.Count & " tables.")
For each table As DataTable In ds.Tables
Dim WorkerJobcodes As New BackgroundWorker
AddHandler WorkerJobcodes.DoWork, AddressOf Async_Project
AddHandler WorkerJobcodes.RunWorkerCompleted, AddressOf WorkCompleted
WorkerJobcodes.RunWorkerAsync(table)
Next
End Sub
I'm not a fan of dumping a code-block and asking 'solve this'. This is the main method that is called, and the BackgroundWorker
simply processes each of the rows into the system.
This all works, but when I timed it using the 10
separate BackgroundWorker's
, then it took 262,597ms; whereas on a single, main thread it took 80,007ms.
Am I misunderstanding the concept of the BackgroundWorker
, hence my performance hit? Or am I using the wrong tool/incorrectly for the job?
Thanks in advance.
Upvotes: 0
Views: 309
Reputation: 117029
Pushing work like this to threads is useful when you are performing CPU-bound operations, but when you're doing IO then you create resource conflicts causing the whole thing to slow down. Your hard drive is simultaneously trying to access different parts of the drive when you use multiple threads. When you only use one it is free to read the data sequentially thus avoiding all of the seek delays.
Upvotes: 3