Reputation: 35557

Select 1000 distinct names from 100 million records via standard sql

I have a table tb_FirstName with one field FirstName. The table has 100 million non null records with lots of repetitions e.g. John occurs 2 million times. The distinct count of FirstName is over 2 million.

How do I select 1000 distinct names as quickly as possible using standard sql?

I'm currently using the following but this is

tSQL

Maybe not as efficient as it could be.

SELECT x.FirstName
FROM (
    SELECT  FirstName,
            rnk = RANK() OVER (ORDER BY Firstname)
    FROM    WHData.dbo.tb_DimUserAccount A
    GROUP BY FirstName
    ) x
WHERE rnk <=1000

Upvotes: 7

Answers (5)

Oleksandr Fedorenko

Reputation: 16904

Option with GROUP BY clause

SELECT TOP 1000 FirstName
FROM WHData.dbo.tb_DimUserAccount
GROUP BY FirstName
ORDER BY FirstName

Upvotes: 1

Harshil

Reputation: 411

Try this

SELECT TOP 1000 FirstName FROM 
(SELECT 
ROW_NUMBER() OVER(PARTITION BY FirstName ORDER BY FirstName) NO,
 FirstName FROM WHData.dbo.tb_DimUserAccount )
  AS T1 WHERE no =1

SELECT DISINCT TOP 1000 FirstName
FROM WHData.dbo.tb_DimUserAccount ORDER BY FirstName

Upvotes: 3

Romil Kumar Jain

Reputation: 20745

You need the data after sorting the results on FirstName fields.

It requires full table scan if Index is not created. If Index is created on FirstName then Unique Index scan can improve the time.

Upvotes: 2

sgeddes

Reputation: 62831

Seems like you could use TOP 1000 with DISTINCT:

SELECT DISINCT TOP 1000 FirstName
FROM WHData.dbo.tb_DimUserAccount
ORDER BY FirstName

Condensed SQL Fiddle Demo

Upvotes: 12

Nick

Reputation: 399

Make sure you have an index defined on FirstName.

SELECT TOP 1000 FirstName
FROM (SELECT DISTINCT FirstName
FROM dbo.tb_DimUserAccount) N
ORDER BY FirstName

Upvotes: 2

Select 1000 distinct names from 100 million records via standard sql

Answers (5)

Related Questions