approaches to speed up scientific computations on server accessible by internet users

Question

I'm interested in any conventional wisdom how to approach the following problem. Note that I'm a hardware guy, so be careful using industry knowledge/terminology/acronyms.

I'm providing an online application that includes very complex math computations, such as fast-Fourier transforms, that involve nested for-loops and very large data arrays (1.6GB each). Users on the internet will access this application, enter some custom parameters, and submit a job that calls these math computations. To keep the user's wait to a minimum, and allow multiple independent sessions for multiple simultaneous users (each user having a separate thread), I'm wondering how I can speed up the math computations, which I anticipate will the a bottleneck.

I'm not so much looking for advice in how to structure the program (e.g. use integer data types whenever possible instead of floating, use smaller arrays, etc.), but rather I'm interested, once the program is complete, what can be done further to speed things up.

For example, how to ensure multiple cores in the CPU are automatically accessed based on demand? (is this done by default or do I need to manage the process somehow?

Or, how to do parallel processing (breaking for-loop up among multiple cores and/or machines)?

Any practical advice is greatly appreciated. I'm sure I'm not the first to need this, so I'm hoping there are industry best practice approaches available that scale with demand.

Thanks in advance!

Alexandre C. · Accepted Answer

FFT methods are highly parallelizable. Especially in multi dimension.

Classical implementations are FFTW and Intel MKL.

One approach (depending on available hardware) is a pool of worker threads (or processes, depending on configuration).

At my job, we have much success with a pool of PCs and as-simple-as-possible data packets, which get queued, computed (in multicore) by one PC, and sent back to user.

Don't try to micro-optimize the math stuff, use instead one of the above libraries. Focus on designing the packets, queuing the computations (don't forget some kind of quota/priorities), making sure computed data is reliably sent back to the thread which has to do the joins on the packets.

Depending on the hardware (enormous SMP computer or PC farms), the problems are different.

(If you have the choice, go for PC farms.)

Edit: You may want to consider OpenMP to automatically paralellize loops. As for PC farms, they offer advantages over big calculators from a flexibility point of view: they scale well, they are not that expensive, and can be bought/sold/reused efficiently. Linux is probably a good choice, but it depends on which environment you're comfortable with.

Sadly, I must say there are no (to my knowledge) good libraries to distribute reliably and efficiently computational requests on PC farms. The problem is quite hard, since you must account for breakdowns, network communication, congestion, distributing processes...

approaches to speed up scientific computations on server accessible by internet users

Answers (2)

Related Questions