Pierce
Pierce

Reputation: 41

SQL select a sample of rows

I need to select sample rows from a set. For example if my select query returns x rows then if x is greater than 50 , I want only 50 rows returned but not just the top 50 but 50 that are evenly spread out over the resultset. The table in this case records routes - GPS locations + DateTime. I am ordering on DateTime and need a reasonable sample of the Latitude & Longitude values. Thanks in advance [ SQL Server 2008 ]

Upvotes: 4

Views: 8565

Answers (4)

naiem
naiem

Reputation: 11

If you want an statically correct sample, tablesample is a wrong solution. A good solution as I described in here based on a Microsoft Research paper, is to create a materialized view over your table which includes an additional column like CAST( ROW_NUMBER() OVER (...) AS BYTE ) AS RAND_COL_, then you can add an index on this column, plus other interesting columns and get statistically correct samples for your queries fairly quickly. (by using WHERE RAND_COL_ = 1).

Upvotes: 0

Janick Bernet
Janick Bernet

Reputation: 21194

Every n'th row to get 50:

SELECT *
FROM table
WHERE row_number() over() MOD (SELECT Count(*) FROM table) / 50 == 0
FETCH FIRST 50 ROWS ONLY

And if you want a random sample, go with jimmy_keen's answer.

UPDATE: In regard to the requirement for it to run on MS SQL, I think it should be changed to this (no MS SQL Server around to test though):

  SELECT TOP 50 *
   FROM (
     SELECT t.*, row_number() over() AS rn, (SELECT count(*) FROM table) / 50 AS step
     FROM table t
   )
   WHERE rn % step == 0

Upvotes: 3

k.m
k.m

Reputation: 31454

To get sample rows in SQL Server, use this query:

SELECT TOP 50 * FROM Table
ORDER BY NEWID();

If you want to get every n-th row (10th, in this example), try this query:

SELECT * From 
(
  SELECT  *, (Dense_Rank() OVER (ORDER BY Column ASC)) AS Rank 
  FROM Table
) AS Ranking 
WHERE Rank % 10 = 0; 

Source

More examples of queries selecting random rows for other popular RDBMS can be found here: http://www.petefreitag.com/item/466.cfm

Upvotes: 3

CesarGon
CesarGon

Reputation: 15335

I suggest that you add a calculated column to your resultset on selection that is obtained as a random number, and then select the top 50 sorted by that column. That will give you a random sample.

For example:

SELECT TOP 50 *, RAND(Id) AS Random
FROM SourceData
ORDER BY Random

where SourceData is your source data table or view. This assumes T-SQL on SQL Server 2008, by the way. It also assumes that you have an Id column with unique ids on your data source. If your ids are very low numbers, it is a good practice to multiply them by a large integer before passing them to RAND, like this:

RAND(Id * 10000000)

Upvotes: 1

Related Questions