Jay
Jay

Reputation: 373

Quickest way to process large number of files with thousands of Data in each file

I need to process data from some large number of file with thousands of data in terms of rows.Earlier i was reading the whole file row by row and processing.It took a lot of time for processing all the file when the number of files increased.Then some one said that threads can be used to perform the task in less amount of time??Can threading make this process fast.I'm using c# language.

Upvotes: 2

Views: 2821

Answers (5)

DarthVader
DarthVader

Reputation: 55032

I would recommend you do batch insert to your database.

You can have a thread that reads a line to a concurrent queue. while other thread is pulling the data from concurrent queue. agregating it if necessary or if you are doing any operation on it. then batch insert the data to database. it will save you quite a time.

Inserting a line to db would be very slow. you have to do batch inserts.

Upvotes: 1

Alexei Levenkov
Alexei Levenkov

Reputation: 100547

Good thing with performance question is to assume that your code is just doing something unnecessary and try to find what it is - measure, review, draw - whatever works for you. I'm not saying that the code you have is slow, it just a way to look at it.

With adding multithreading to the mix first you may find it to be much harder to analyze the code.

More concrete for your task: combining multiple similar operation (like read a record from file or commit to DB) together may save significant amount of time (you need to prototype and measure).

Upvotes: 1

Jerry Coffin
Jerry Coffin

Reputation: 490158

Threading is one way (there are others) of letting you overlap the processing with the I/O. That means instead of the total time being the sum of the time to read the data and the time to processing the data, you can reduce it to (roughly) whichever of the two is larger (usually the I/O time).

If you mostly want to overlap the I/O time, you might want to look at overlapped I/O and/or I/O completion ports.

Edit: If you're going to do this, you normally want to base the number of I/O threads on the number of separate physical disks you're going to be reading from, and the number of processing threads on the number of processors you have available to do the processing (but only as many as necessary to keep up with the data being supplied by the reader thread). For a typical desktop machine, that will often mean only two threads, one to read and one to process data.

Upvotes: 0

user1241335
user1241335

Reputation:

Yes, using threads can speed thigns up.
Threads are to be used when you have time onsuming tasks you can run in the background (like, when you process say 10 files, but only need one, you can have a thread process each of them which will be a lot faster then processing them all on your main thread).

Please not that there may be bugs related, so you should make sure all threads finished running before continuing and trying to access what they got.

Look up "C#.NET multithreading" any thread can run a specified function, and background worker is a nice class as well (I prefer pure multithreading though).

Also note that this may backfire and wind up slower, but it's a good idea to try.

Upvotes: 0

Gray
Gray

Reputation: 116888

It certainly can although it depends on the particular job in question. A very common pattern is to have one thread doing the file IO and multiple threads processing the actual lines.

How many processing threads to start will depend on how many processors/cores you have on your system, and how the results of the processing get written out. If the processing time per line is very small however, you probably won't get too much speed improvement having multiple processing threads and a single processing thread would be optimal.

Upvotes: 1

Related Questions