Reputation: 21

Multithreading - Is threadpool good choice?

I have a c#(.Net 3.5) application which imports thousands of files. Right now, I create background worker for each file. It works good up to certain limit & then application dies out with System out of memory exception. I am assuming this is happening because of large number of threads. Is threadpool a good solution for this situation?

Exception is :

    System.OutOfMemoryException | Exception of type 'System.OutOfMemoryException' was thrown. 
    at System.Data.RBTree`1.TreePage..ctor(Int32 size)
    at System.Data.RBTree`1.AllocPage(Int32 size)
    at System.Data.RBTree`1.InitTree()
    at System.Data.Index.InitRecords(IFilter filter)
    at System.Data.Index..ctor(DataTable table, Int32[] ndexDesc, IndexField[] indexFields,           
    Comparison`1 comparison, DataViewRowState recordStates, IFilter rowFilter)
    at System.Data.DataTable.GetIndex(IndexField[] indexDesc, DataViewRowState recordStates, IFilter 
    rowFilter)
    at System.Data.DataColumn.get_SortIndex()
    at System.Data.DataColumn.IsNotAllowDBNullViolated()
    at System.Data.DataTable.EnableConstraints()
    at System.Data.DataTable.set_EnforceConstraints(Boolean value)
    at System.Data.DataTable.EndLoadData()
    at System.Data.Common.DataAdapter.FillFromReader(DataSet dataset, DataTable datatable, String    
    srcTable, DataReaderContainer dataReader, Int32 startRecord, Int32 maxRecords, DataColumn    
    parentChapterColumn, Object parentChapterValue)
    at System.Data.Common.DataAdapter.Fill(DataTable[] dataTables, IDataReader dataReader, Int32 
    startRecord, Int32 maxRecords)
    at System.Data.Common.DbDataAdapter.FillInternal(DataSet dataset, DataTable[] datatables, Int32 
    startRecord, Int32 maxRecords, String srcTable, IDbCommand command, CommandBehavior behavior)
    at System.Data.Common.DbDataAdapter.Fill(DataTable[] dataTables, Int32 startRecord, Int32  
    maxRecords, IDbCommand command, CommandBehavior behavior)
    at System.Data.Common.DbDataAdapter.Fill(DataTable dataTable)
    at Dms.Data.Adapters.DataTableAdapterBase`2.FillByCommand(TTbl table, DbCommand command)

Upvotes: 1

Answers (4)

Sergey Vyacheslavovich Brunov

Reputation: 18126

If I have understood you right, you need to implement producer-consumer approach: 1) one producer - produces file list (to be imported). 2) several (fixed number) consumers - perform import.

To achieve this, you would use BlockingCollection (since .NET 4.0). There's an example in the documentation.

Upvotes: 0

Jeb

Reputation: 3799

I think it is a good choice. However, background worker has been superseded somewhat by .Net 4 framworks tasks. This optimises based on the number of processors on your machine and dishes the work out accordingly. Perhaps you could use the TPL and use a parallel for. You can pass in the max number of concurrent thread pool threads to run in order to limit how many files you import at once, in batches, e.g.:

ParallelOptions options = new ParallelOptions();  
options.MaxDegreeOfParallelism = 4;

This might help you?

Upvotes: 1

joe

Reputation: 768

It can be, yes. Consider that there are only a finite number of CPUs, or cores. Only that many threads can be running concurrently. You could have more active, say if many of them will be waiting on some other process running on a different computer (like if you're downloading these files). Just because you have a separate thread, doesn't mean it's adding concurrency. Just switching costs, and memory allocation (as you've seen). Depending on the amount of idle time, try limiting your pool to just slightly more threads than cpus. Tweak from there.

Upvotes: 1

Reed Copsey

Reputation: 564821

The problem is most likely that you're trying to load too many files at once time.

Using a ThreadPool may help, as it could give you a means of limiting the processing. However, if you're importing and processing "thousands of files", the appropriate means may be to create a pipeline to handle your processing, and then fill the pipeline (or a certain number of them) with your files. This would let you control the amount of concurrency, and prevent too many individual files from being processed at the same time. It could keep your memory and processing requirements to a more reasonable level.

Edit:

Since you (now) mentioned that you're using C#... The BackgroundWorker actually does use the ThreadPool. Switching to using the thread pool directly may still be a good idea, but it likely won't solve the issue entirely. You may want to consider using something like BlockingCollection<T> to set up a producer/consumer queue. You could then have 1 or more threads "consume" the files and process them, and just add all of the files to the BlockingCollection<T>. This would give you control over how many files are handled at once (just add another thread for processing as you can).

Upvotes: 4

Multithreading - Is threadpool good choice?

Answers (4)

Related Questions