luis
luis

Reputation: 2305

Fast way to select small sample from huge table

The table I have is huge about 100+ million entries, it is ordered by default by 'A'. There could be many items with the same column A, A increases from 0 to... A big number. I tried TABLESAMPLE but it does not quite select a good number from each A number, it skips some of them or maybe I am not using it well. So I would like to select the same amount of values from each A number. And I would like the total of selected rows to be a number, let's say 10 million or let's call it B.

Upvotes: 0

Views: 2258

Answers (3)

Gerardo Lima
Gerardo Lima

Reputation: 6703

@RBarryYoung solution is right, generic and it works for any constant statistic distribution, like id sequences (or any auto-increment column). Sometimes, though, your distribution is not constant or you can run into performance issues (SQL Server has to scan all index entries to calculate the WHERE clause).

If any of those affects your problem, consider the built-in T-SQL operator TOP that may suit your needs:

SELECT TOP (30) PERCENT *
FROM YourTable;

Upvotes: 0

YS.
YS.

Reputation: 1828

You can make use of NEWID():

SELECT TOP 100
  *
FROM
  YourTable
ORDER BY NEWID()

Upvotes: 1

RBarryYoung
RBarryYoung

Reputation: 56735

While it's not exactly clear to me what you need to achieve, when I have needed a large sample subset that is very well distributed between Parent and/or common Attribute values, I have done it like this:

SELECT  *
FROM    YourTable
WHERE   (YourID % 10) = 3

This also has the advantage that you can get another completely different sample just by changing the "3" to another digit. Plus you can change the sub-sample size by adjusting the "10".

Upvotes: 2

Related Questions