Reputation: 1
I have a table with 700 columns. I am trying to get a list of distinct values for each column and their count. I am using the below query to get the result for 1 column
Select distinct col1, count(*) from MyTable group by 1.
Result:
col1 count(*)
a 10
b 20
c 40
How can I get the result for all columns using a single query in the most optimal way?
Upvotes: 0
Views: 6241
Reputation: 1
Replace [table name] with the table you need counts for.
DECLARE @table varchar(100) = '[table name]'
DECLARE @i INT = 1, @cntOUT int, @SQL nvarchar(500) = ''
DECLARE @ParmDef nvarchar(500) = N'@cnt int OUTPUT';
SELECT column_id, name, 0 as record_count
INTO #T1
FROM sys.all_columns c
WHERE c.object_id = (SELECT object_id FROM sys.objects WHERE name = @table AND type = 'U')
WHILE @i <= (SELECT MAX(column_id) FROM #T1)
BEGIN
SELECT @SQL = 'SELECT @cnt = COUNT(DISTINCT ' + name + ') FROM ' + @table + ';'
FROM #T1 WHERE column_id = @i;
EXECUTE sp_executesql @stmt = @SQL, @ParmDefinition = @ParmDef, @cnt = @cntOUT OUTPUT;
UPDATE #T1 SET record_count = @cntOUT WHERE column_id = @i
SET @i = @i + 1
END
SELECT * FROM #T1
--DROP TABLE #T1
Upvotes: 0
Reputation: 384
A list with distinct values for each column is impossible. What if column A has 5 distinct values and column B has 7. What would your list look like?
The other question is easier, but as @Gordon Linoff states, takes 2 steps. Elaborating on his answer, for MS SQL:
select replace(replace(' count(distinct([col])) as [col],',
'[tab]', table_name
), '[col]', column_name
)
from information_schema.columns
where table_name = 'your_table';
Copy the results and paste them in a new query window between.
SELECT
[[results query 1]]
FROM your_table
Remember to delete the last ',' from query 1 results.
Upvotes: 0
Reputation: 1270061
The basic query is:
select col001, count(*) from MyTable group by col001 union all
select col002, count(*) from MyTable group by col002 union all
. . .
select col700, count(*) from MyTable group by col700 ;
Not pleasant, but that is basically the query you need to run. SQL doesn't really do multiple independent aggregations more efficiently than doing them separately (even using grouping sets
, in my experience).
You can construct the query. One way is to run something like this:
select replace(replace('select [col], count(*) as cnt from [tab] group by [col] union all ',
'[tab]', table_name
), '[col]', column_name
)
from information_schema.columns
where table_name = 'mytable' and table_schema = ??;
You can then copy the generated SQL (removing the final union all
) and run it.
Note: That above is generic; the exact code might differ by database.
Upvotes: 1