whytheq
whytheq

Reputation: 35557

Select 1000 distinct names from 100 million records via standard sql

I have a table tb_FirstName with one field FirstName. The table has 100 million non null records with lots of repetitions e.g. John occurs 2 million times. The distinct count of FirstName is over 2 million.

How do I select 1000 distinct names as quickly as possible using standard sql?

I'm currently using the following but this is

Upvotes: 7

Views: 18696

Answers (5)

Oleksandr Fedorenko
Oleksandr Fedorenko

Reputation: 16904

Option with GROUP BY clause

SELECT TOP 1000 FirstName
FROM WHData.dbo.tb_DimUserAccount
GROUP BY FirstName
ORDER BY FirstName

Upvotes: 1

Harshil
Harshil

Reputation: 411

Try this

SELECT TOP 1000 FirstName FROM 
(SELECT 
ROW_NUMBER() OVER(PARTITION BY FirstName ORDER BY FirstName) NO,
 FirstName FROM WHData.dbo.tb_DimUserAccount )
  AS T1 WHERE no =1 

or

SELECT DISINCT TOP 1000 FirstName
FROM WHData.dbo.tb_DimUserAccount ORDER BY FirstName

Upvotes: 3

Romil Kumar Jain
Romil Kumar Jain

Reputation: 20745

You need the data after sorting the results on FirstName fields.

It requires full table scan if Index is not created. If Index is created on FirstName then Unique Index scan can improve the time.

Upvotes: 2

sgeddes
sgeddes

Reputation: 62831

Seems like you could use TOP 1000 with DISTINCT:

SELECT DISINCT TOP 1000 FirstName
FROM WHData.dbo.tb_DimUserAccount
ORDER BY FirstName

Condensed SQL Fiddle Demo

Upvotes: 12

Nick
Nick

Reputation: 399

Make sure you have an index defined on FirstName.

SELECT TOP 1000 FirstName
FROM (SELECT DISTINCT FirstName
FROM dbo.tb_DimUserAccount) N
ORDER BY FirstName

Upvotes: 2

Related Questions