GetStacked
GetStacked

Reputation: 39

Every distinct value for every column

I'd like to create a dynamic select that returns every distinct value for each column in a wide table. I.e.

select distinct @mycolumn 
from @mytable

for every column and the results combined to a single table.

Edit1:

Example: Example

Edit2: The order of the returned data won't matter, and the source table can have all sorts of data types.

Any advice appreciated, thank you!

Upvotes: 2

Views: 3032

Answers (2)

Zohar Peled
Zohar Peled

Reputation: 82474

The only way I can think of is very cumbersome and probably extremely slow: Using a Tally table (I've generated one using a recursive cte for the sake of this answer, but that's also not a very good way to do that...) and multiple derived tables left joined to that tally table I was able to come up with something that will generate the desired output.
However, as I wrote on the top - it's very cumbersome and probably extremely slow (I've tested only on a table with 5 columns and 6 rows so I have no idea about execution speed).

DECLARE @Count int
select  @Count = COUNT(1) 
FROM YourTable

;with tally as (
    select 1 as n
    union all 
    select n + 1
    from tally 
    where n < @Count 
)

SELECT Column1, Column2, Column3, Column4, Column5
FROM tally 
LEFT JOIN 
(
    SELECT Column1, ROW_NUMBER() OVER (ORDER BY Column1) rn
    FROM 
    (
        SELECT DISTINCT Column1
        FROM YourTable 
    ) t1
) d1 ON(n = d1.rn)
LEFT JOIN 
(
    SELECT Column2, ROW_NUMBER() OVER (ORDER BY Column2) rn
    FROM 
    (
        SELECT DISTINCT Column2
        FROM YourTable 
    ) t1
) d2 ON(n = d2.rn)
LEFT JOIN 
(
    SELECT Column3, ROW_NUMBER() OVER (ORDER BY Column3) rn
    FROM 
    (
        SELECT DISTINCT Column3
        FROM YourTable 
    ) t1
) d3 ON(n = d3.rn)
LEFT JOIN 
(
    SELECT Column4, ROW_NUMBER() OVER (ORDER BY Column4) rn
    FROM 
    (
        SELECT DISTINCT Column4
        FROM YourTable 
    ) t1
) d4 ON(n = d4.rn)
LEFT JOIN 
(
    SELECT Column5, ROW_NUMBER() OVER (ORDER BY Column5) rn
    FROM 
    (
        SELECT DISTINCT Column5
        FROM YourTable 
    ) t1
) d5 ON(n = d5.rn)

Dynamic version:

DECLARE @TableName sysname = 'YourTableName'

DECLARE @Sql nvarchar(max) = 
'
DECLARE @Count int
select  @Count = COUNT(1) 
FROM '+  @TableName +'

;with tally as (
    select 1 as n
    union all 
    select n + 1
    from tally 
    where n < @Count 
)

SELECT '

SELECT @Sql = @Sql + Column_Name +','
FROM INFORMATION_SCHEMA.COLUMNS 
WHERE TABLE_NAME = @TableName

SELECT @Sql = LEFT(@Sql, LEN(@Sql) - 1) + ' FROM tally t'

SELECT @Sql = @Sql + ' LEFT JOIN (SELECT '+ Column_Name +', ROW_NUMBER() OVER (ORDER BY ' + Column_Name +') rn
  FROM 
  (
    SELECT DISTINCT '+ Column_Name +' FROM '+ @TableName +') t
  ) c_'+ Column_Name + ' ON(n =  c_'+ Column_Name + '.rn)'
FROM INFORMATION_SCHEMA.COLUMNS 
WHERE TABLE_NAME = @TableName

EXEC(@Sql)

Update

Tested on a table with 22 columns and 47,000 rows, my suggestion took 46 seconds when using a proper tally table. on Sql server 2014. I was surprised - I thought it would take at least 2-3 minutes.

Upvotes: 2

Kahn
Kahn

Reputation: 1660

Here's a dynamic set I was working on. I'm running out of time so it's not cleaned up, and it determines the dynamic row numbers by the max number of rows in the table as a whole, meaning that if you have any duplicates in any column at all, you'll be left with rows where every single column is null.

But other than that, this should work perfectly fine, and the script contains the necessary info showing you how to concatenate a final "WHERE S1.COLNAME IS NOT NULL AND S2.COLNAME IS NOT NULL AND .." filter to the result table, to eliminate those full-null rows.

Other than that, here's the script. It's gonna be heavy, obviously, so I included a (nolock) hint in it, and a "WHERE ColName is not null" to remove useless results.

Try this on a smaller table and see it work.

/*
Set your table and schema on @MYTABLE and @MYSCHEMA variables.
*/
SET NOCOUNT ON

DECLARE @MYTABLE SYSNAME = 'Mytablename here'
    , @MYSCHEMA sysname = 'dbo'

DECLARE @SQL NVARCHAR(MAX) = '', @COLNAME sysname = '', @MYCOLS NVARCHAR(max) = ''

DECLARE @COL_NOW INT = 1, @COL_MAX INT = 
    (SELECT COUNT(*) 
    FROM sys.columns
    WHERE object_id = (SELECT object_id FROM sys.tables where name = @MYTABLE and SCHEMA_NAME(schema_id) = @MYSCHEMA))

SELECT @COLNAME = name 
    FROM sys.columns 
    WHERE column_id = 1
    and object_id = (SELECT object_id FROM sys.tables where name = @MYTABLE and SCHEMA_NAME(schema_id) = @MYSCHEMA)

SET @SQL = 'FROM 
    (SELECT ROW_NUMBER() OVER (ORDER BY '+@COLNAME+' ASC) RN
    FROM '+@MYSCHEMA+'.'+@MYTABLE+' (nolock)) S'

WHILE @COL_NOW <= @COL_MAX
BEGIN

    SELECT @COLNAME = name 
    FROM sys.columns 
    WHERE column_id = @COL_NOW 
    and object_id = (SELECT object_id FROM sys.tables where name = @MYTABLE and SCHEMA_NAME(schema_id) = @MYSCHEMA)

    SELECT @SQL = @SQL+'
FULL JOIN 
    (SELECT DISTINCT DENSE_RANK() OVER (ORDER BY '+@COLNAME+' ASC) RN, '+@COLNAME+'
    FROM '+@MYSCHEMA+'.'+@MYTABLE+' (nolock) 
    WHERE '+@COLNAME+' IS NOT NULL) S'+CAST(@COL_NOW AS NVARCHAR(25))+' ON S'+CAST(@COL_NOW AS NVARCHAR(25))+'.RN = S.RN'

    IF @COL_NOW = 1
        SELECT @MYCOLS = @MYCOLS+' S'+CAST(@COL_NOW AS NVARCHAR(25))+'.'+@COLNAME           
    ELSE
        SELECT @MYCOLS = @MYCOLS+', S'+CAST(@COL_NOW AS NVARCHAR(25))+'.'+@COLNAME

    SET @COL_NOW = @COL_NOW+1

END

SELECT @SQL = 'SELECT'+@MYCOLS+'
'+@SQL+'
ORDER BY S1.RN ASC';

--PRINT(@SQL); -- To check resulting dynamic SQL without executing it (Warning, print will only show first 8k characters)

EXEC sp_executesql @SQL;
GO

Upvotes: 1

Related Questions