Neal Parker
Neal Parker

Reputation: 76

Efficiency of top vs. sample in Teradata

In Teradata, which is more efficient - Sample or Top?

Since sample is random, does that cause Teradata to do less work and result in faster returns?

Routinely, I just want to see a few rows.

Thread related: differences between top and sample in teradata sql

Upvotes: 1

Views: 8694

Answers (4)

Filip Uscilowicz
Filip Uscilowicz

Reputation: 66

I just ran three queries on a large Teradata dataset:

  1. SELECT * FROM table Sample 10;
  2. SELECT * Top 10 FROM table; (with no order by)
  3. SELECT * Top 10 FROM table ORDER BY column;

The DBQL metrics show that by far, the Top 10 with no order is the least resource-intensive. I had about a 99% drop in I/O & CPU just changing from SAMPLE to TOP.

So if your goal is purely efficiency, then TOP without the Order by is the clear winner going by TD's DBQL metrics.

DBQL Metrics

Upvotes: 5

Jason
Jason

Reputation: 945

From an efficiency standpoint:

Using Top WITHOUT an Order By will be about the same as Sample.

Using Top WITH an Order By will cause more overhead.

Upvotes: 0

Biswabid
Biswabid

Reputation: 1411

I think its better to use TOP without order by than sample. Both of these evaluate the whole expression before displaying the result.

Reference : http://www.teradatatips.com/2010/10/top-vs-sample.html

Upvotes: -2

ULick
ULick

Reputation: 999

TOP usually is the faster one, taking the shortcut out of processing when enough data has been returned.

Sample has a lot more options and works to be statistically correct. To achieve that, it may have to read more data blocks.

Upvotes: 0

Related Questions