Reputation: 9
I have the need to process millions of files. Currently, I use a custom thread manager to do the work by using a DataGridView to keep track of the threads and a timer to check if more threads can start; kind of like (sudo):
Private Sub ThreadManager()
If AVailableThreads > 0 then
Dim t as Threading.Thread = New Thread(AddressOf MyThread)
t.Start()
AvailableThreads = AvailableThreads - 1
ThreadManager()
End If
End Sub
This has many drawbacks, the main ones being that the CPU and memory usage is high as each of the above threads process a full directory instead of each file independently.
So I have rewritten the process. Now I have a class that will perform the process at the file level and report back to the main thread the results; like so:
Imports System.IO
Public Class ImportFile
Public Class ImportFile_state
Public ID as Long = Nothing
Public FilePath as String = Nothing
Public Result as Boolean = False
End Class
Public Event ReportState(ByVal state as ImportFile_state)
Dim _state as ImportFile_state = New ImportFile_State
Public Sub New(ByVal ID as Long, ByVal FilePath as String)
MyBase.New()
_state.ID = ID
_state.FilePath = FilePath
End Sub
Public Sub GetInfo()
'Do the work here, but just return the result for this demonstration
Try
_state.Result = True
Catch ex As Exception
_state.Result = False
Finally
RaiseEvent ReportState(_state)
End Try
End Sub
End Class
The above class works like a charm and is very fast, uses almost no memory and next to nothing of the CPU. Albeit that I have only been able to test this with a few hundred threads using the Threading.Thread process.
Now I would like to use the ThreadPool.QueueUserWorkItem to execute the above class for each file allowing the system to control the number of threads to have running at any given time. However, I know I cannot just dump several million threads into the ThreadPool without locking up my server. I have done a lot of research on this and I have only been able to find examples/discussions on using the ThreadPool.QueueUserWorkItem for a few threads. What I need is to fire off millions of these threads.
So, I have two questions: 1) Should I even be trying to use the ThreadPool.QueueUserWorkItem to run this many threads, and 2) is the code below sufficient to perform this process without locking up my server?
Here is my code so far:
For Each subdir As String In Directory.GetDirectories(DirPath)
For Each fl In Directory.GetFiles(subdir)
'MsgBox(fl)
Dim f As ImportFile = New ImportFile(0, fl)
AddHandler f.ReportState, AddressOf GetResult
ThreadPool.QueueUserWorkItem(New Threading.WaitCallback(AddressOf f.GetInfo))
ThreadPool.GetAvailableThreads(worker, io)
Do While (worker) <= 0
Thread.Sleep(5000)
ThreadPool.GetAvailableThreads(worker, io)
Loop
Next
Next
Upvotes: 0
Views: 1992