Reputation: 2305
The table I have is huge about 100+ million entries, it is ordered by default by 'A'. There could be many items with the same column A, A increases from 0 to... A big number. I tried TABLESAMPLE but it does not quite select a good number from each A number, it skips some of them or maybe I am not using it well. So I would like to select the same amount of values from each A number. And I would like the total of selected rows to be a number, let's say 10 million or let's call it B.
Upvotes: 0
Views: 2258
Reputation: 6703
@RBarryYoung solution is right, generic and it works for any constant statistic distribution, like id sequences (or any auto-increment column). Sometimes, though, your distribution is not constant or you can run into performance issues (SQL Server has to scan all index entries to calculate the WHERE clause).
If any of those affects your problem, consider the built-in T-SQL operator TOP
that may suit your needs:
SELECT TOP (30) PERCENT *
FROM YourTable;
Upvotes: 0
Reputation: 1828
You can make use of NEWID()
:
SELECT TOP 100
*
FROM
YourTable
ORDER BY NEWID()
Upvotes: 1
Reputation: 56735
While it's not exactly clear to me what you need to achieve, when I have needed a large sample subset that is very well distributed between Parent and/or common Attribute values, I have done it like this:
SELECT *
FROM YourTable
WHERE (YourID % 10) = 3
This also has the advantage that you can get another completely different sample just by changing the "3" to another digit. Plus you can change the sub-sample size by adjusting the "10".
Upvotes: 2