devotee
devotee

Reputation: 127

How to count TBB processing instructions?

Intel TBB suggests users to adjust grain size to about 10,000 to 100,000 processing instructions for most efficient parallelism. However, there isn't a guideline as to what counts as a processing instruction. Do I count summations, equalization, multiplication, comparison, etc.; and, if I do, what are the weights of these operations? Are there any profiling tool that count processing instructions the way that TBB means?

Upvotes: 0

Views: 240

Answers (2)

cahuson
cahuson

Reputation: 846

Kevin. As Alex says the guideline is approximate, and there are other concerns involved. For instance, if part of the computation accesses data under a lock, that is going to probably dominate your time. If there are imbalances in the computation work per task, that will make tweaking the unit size much less important.

I didn't find the TBB documentation that talks about determining partition size, but there is a slide set here that talks about the "bathtub graph" (slide 7.) It demonstrates the point that getting an exact work load per task is not necessary; there will be a range that will work well.

The TBB scheduler also will try to balance the work across all processors by stealing task partitions from other CPUs if it can, so one unbalanced workload doesn't completely incapacitate you.

Upvotes: 3

Alex
Alex

Reputation: 632

It is a very rough recommendation to give an idea what is the reasonable execution time of one piece of computation work. The idea is that the computation task should not be too small and there is no benefits from too large tasks. Usually, you do not need to worry about these rule if you use parallel algorithm with a default partitioner (auto_partitioner).

In some cases (e.g. when you need to use simple_partitioner) you can measure the serial time of the algorithm and multiple it to a frequency of your CPU. This value can give you an idea about the number "instruction"/"clock ticks" of the whole problem. So you can divide the problem into pieces of the recommended size.

As for the tools, I suppose there are many profiling tools that can calculate the execution time (or CPU instructions) of your application on a particular platform. (See List of performance analysis tools). In addition, you can try Intel VTune Amplifier that can estimate the overhead introduced by Intel TBB (the tool has a special support for TBB based applications) to understand if the application uses TBB efficiently.

Upvotes: 2

Related Questions