Hunter McMillen
Hunter McMillen

Reputation: 61510

Divide work among processes or threads?

I am interning for a company this summer, and I got passed down this program which is a total piece. It does very computationally intensive operations throughout most of its duration. It takes about 5 minutes to complete a run on a small job, and the guy I work with said that the larger jobs have taken up to 4 days to run. My job is to find a way to make it go faster. My idea was that I could split the input in half and pass the halves to two new threads or processes, I was wondering if I could get some feedback on how effective that might be and whether threads or processes are the way to go.

Any inputs would be welcomed. Hunter

Upvotes: 2

Views: 2057

Answers (8)

Brian Gideon
Brian Gideon

Reputation: 48949

Personally, I would invest my effort into profiling the application first. You can gain a much better awareness of where the problem spots are before attempting a fix. You can parallelize this problem all day long, but it will only give you a linear improvement in speed (assuming that it can be parallelized at all). But, if you can figure out how to transform the solution into something that only takes O(n) operations instead of O(n^2), for example, then you have hit the jackpot. I guess what I am saying is that you should not necessarily focus on parallelization.

You might find spots that are looping through collections to find specific items. Instead you can transform these loops into hash table lookups. You might find spots that do frequent sorting. Instead you could convert those frequent sorting operations into a single binary search tree (SortedDictionary) which maintains a sorted collection efficiently through the many add/remove operations. And maybe you will find spots that repeatedly make the same calculations. You can cache the results of already made calculations and look them up later if necessary.

Upvotes: 0

aL3891
aL3891

Reputation: 6275

I'd take a strong look at TPL that was introduced in .net4 :) PLINQ might be especially useful for easy speedups.

Genereally speaking, splitting into diffrent processes(exefiles) is inadvicable for perfomance since starting processes is expensive. It does have other merits such as isolation(if part of a program crashes) though, but i dont think they are applicable for your problem.

Upvotes: 3

Matthew Cox
Matthew Cox

Reputation: 13672

Well if the problem has a parallel solution then this is the right way to (ideally) significantly (but not always) increase performance.

However, you don't control making additional processes except for running an app that launches multiple mini apps ... which is not going to help you with this problem.

You are going to need to utilize multiple threads. There is a pretty cool library added to .NET for parallel programming you should take a look at. I believe its namespace is System.Threading.Tasks or System.Threading with the Parallel class.

Edit: I would definitely suggest though, that you think about whether or not a linear solution may fit better. Sometimes parallel solutions would taken even longer. It all depends on the problem in question.

Upvotes: 1

JaCraig
JaCraig

Reputation: 1078

If you need to communicate/pass data, go with threads (and if you can go .Net 4, use the Task Parallel Library as others have suggested). If you don't need to pass info that much, I suggest processes (scales a bit better on multiple cores, you get the ability to do multiple computers in a client/server setup [server passes info to clients and gets a response, but other than that not much info passing], etc.).

Upvotes: 0

NotJarvis
NotJarvis

Reputation: 1247

Best way when optimising code, always, is to Profile it to find out where the Logjam's are IMO.

Sometimes you can find non obvious huge speed increases with little effort.

Eqatec, and SlimTune are two free C# profilers which may be worth trying out.

(Of course the other comments about which parallelization architecture to use are spot on - it's just I prefer analysis first....

Upvotes: 1

Kurru
Kurru

Reputation: 14331

Use threads if theres lots of memory sharing in your code but if you think you'd like to scale the program to run across multiple computers (when required cores > 16) then develop it using processes with a client/server model.

Upvotes: 2

Bart van Heukelom
Bart van Heukelom

Reputation: 44104

If the jobs are splittable, then going multithreaded/multiprocessed will bring better speed. That is assuming, of course, that the computer they run on actually has multiple cores/cpus.

Threads or processes doesn't really matter regarding speed (if the threads don't share data). The only reason to use processes that I know of is when a job is likely to crash an entire process, which is not likely in .NET.

Upvotes: 2

Jon
Jon

Reputation: 437424

Have a look at the Task Parallel Library -- this sounds like a prime candidate problem for using it.

As for the threads vs processes dilemma: threads are fine unless there is a specific reason to use processes (e.g. if you were using buggy code that you couldn't fix, and you did not want a bad crash in that code to bring down your whole process).

Upvotes: 1

Related Questions