DeeKayy90
DeeKayy90

Reputation: 857

Multithreading on multiple datatables

TL;DR: Why is it that a single-threaded application runs this code in 80077ms, whereas multithreaded, it takes over 3 times as long?

Bit of a long-winded question and possibly could result in a much simpler resolution than what I've found.

I'm trying to separate one datatable into x amount of datatables, based on a variable that is hard-coded. That's not where the issue arises, although if anyone has a cleaner solution than I have, I'd be very appreciative of help in that aspect.

My issue arises from the fact that, even though I generate x amount of BackgroundWorker, I am still getting results that are showing that it's not advantageous to break the main table into multiple tables.

The idea behind this is simple - we have an application that can run only a certain amount of concurrent connections, let's say 10 simultaneous connections. I want to be able to get the initial datatable of let's say 150,000 rows, and I know that for 10 connections I may as well make 10 datatables of 15,000 rows each, then process each table individually, rather than pushing through 150,000 with one datatable, all under one connection.

So far, this is what I've come up with

Private Sub CheckJobcodesPendingUpdate()
    Jobcode_AlreadyTried = New List(Of Integer)
    Dim sw = Stopwatch.StartNew()
    RTB.AppendText("I'm starting..." & vbCrLf)

    Dim Jobcodes As DataTable = SQL.SQLdataTable("SELECT [Jobcode] FROM [database].[schema].[Jobcodes]")
    sw.Stop
    RTB.AppendText("Took " & sw.ElapsedMilliseconds & "ms to retrieve " & Jobcodes.Rows.Count & " rows." & vbCrLf)

    Application.DoEvents

    sw = Stopwatch.StartNew()

    Dim ds As New DataSet
    Dim dt As Datatable

    Dim tableSeperator as Integer = Jobcodes.Rows.Count / 10 'The amount of connections we can have simultaneously.
    Dim tableCount As Integer = 0
    tableCount = Math.Ceiling(JobcodesEPC10.Rows.Count / tableSeperator)

    Do Until tableCount = 0
        dt = (From t In Jobcodes Order By t.Item("Jobcode") Ascending Select t).Take(tableSeperator).CopyToDataTable
        For each row As DataRow In dt.Rows
            Jobcodes.Rows.Remove(Jobcodes.AsEnumerable.First(Function(r) r.Item("Jobcode") = row.Item("Jobcode")))
        Next
        ds.Tables.Add(dt)
        tableCount -= 1
    Loop

    sw.Stop

    RTB.AppendText(vbCrLf & "Took " & sw.ElapsedMilliseconds & "ms to create all " & ds.Tables.Count & " tables.")

    For each table As DataTable In ds.Tables
        Dim WorkerJobcodes As New BackgroundWorker
        AddHandler WorkerJobcodes.DoWork, AddressOf Async_Project
        AddHandler WorkerJobcodes.RunWorkerCompleted, AddressOf WorkCompleted
        WorkerJobcodes.RunWorkerAsync(table)
    Next

End Sub

I'm not a fan of dumping a code-block and asking 'solve this'. This is the main method that is called, and the BackgroundWorker simply processes each of the rows into the system.

This all works, but when I timed it using the 10 separate BackgroundWorker's, then it took 262,597ms; whereas on a single, main thread it took 80,007ms.

Am I misunderstanding the concept of the BackgroundWorker, hence my performance hit? Or am I using the wrong tool/incorrectly for the job?

Thanks in advance.

Upvotes: 0

Views: 309

Answers (1)

Enigmativity
Enigmativity

Reputation: 117029

Pushing work like this to threads is useful when you are performing CPU-bound operations, but when you're doing IO then you create resource conflicts causing the whole thing to slow down. Your hard drive is simultaneously trying to access different parts of the drive when you use multiple threads. When you only use one it is free to read the data sequentially thus avoiding all of the seek delays.

Upvotes: 3

Related Questions