Reputation: 1
I am looking for a way to run frequencies for a number of different columns. I'm used to working in SAS and SQL is still new. In SAS I'd run a proc freq; table var1 var2 var3; run;
command; however, it appears as though this requires a bit more code in SQL. Because of the number of variables/columns I'm obtaining frequencies for, I really don't want to have to write out the same code over and over again. Eg:
select
language,
count(*) as freq
from #pop
group by language
order by language
I've been looking at cursor commands to loop through column names, so I start with generating a new temporary table with all the column names that I'd like to loop through. Here's what I've been able to put together:
drop table if exists #dem_table;
create table #dem_table (dem_vars varchar(50) not null);
insert into #dem_table values ('age_cat'),('race'),('ethnicity'),('language'),('sex'),('orientation'),('income'),('urban'),('poverty'),('children'),('marital_status'),('state');
declare @column varchar(50);
declare cursor_dem cursor for
select * from #dem_table;
open cursor_dem;
fetch next from cursor_dem into @column;
while @@fetch_status = 0
begin
select @column, count(*) as freq
from #pop
group by @column
order by @column;
fetch next from cursor_dem into @column;
end;
close cursor_dem;
deallocate cursor_dem;
It works until I reach the "group by" and "order by" syntax, and then I receive the following errors:
Each GROUP BY expression must contain at least one column that is not an outer reference. The SELECT item identified by the ORDER BY number 1 contains a variable as part of the expression identifying a column position. Variables are only allowed when ordering by an expression referencing a column name.
Wondering if dynamic sql might be helpful? But I can't get it to iterate through columns that way either.
Upvotes: 0
Views: 161
Reputation: 33571
Here is another option leveraging the sys.columns to build the dynamic sql for this. It is a bit simpler than using a dynamic unpivot for something like this.
If you want the distinct counts in a single row per column it would be something like this.
declare @sql nvarchar(max) = 'select '
, @TableName sysname = 'YourTableName'
select @sql = @sql + quotename(c.name) + ' = count(distinct ' + quotename(c.name) + '), '
from sys.columns c
where object_id = object_id(@TableName)
set @sql = left(@sql, len(@sql) - 1) + ' from ' + QUOTENAME(@TableName)
exec sp_executesql @sql
If however you want individual rows you could do something similar like this.
declare @sql nvarchar(max) = ''
, @TableName sysname = 'YourTableName'
select @sql = @sql + 'select Name = ''' + c.name + ''', ValueCount = count(' + quotename(c.name) + '), ' + quotename(c.name) + ' = count(distinct ' + quotename(c.name) + ') from ' + quotename(@TableName) + ' union all '
from sys.columns c
where object_id = object_id(@TableName)
set @sql = left(@sql, len(@sql) - 10)
exec sp_executesql @sql
And last but not least if you really do want a new result set for each value as you indicated you can modify to something like this.
declare @sql nvarchar(max) = ''
, @TableName sysname = 'YourTableName'
select @sql = @sql + 'select Name = ''' + c.name + ''', ValueCount = count(' + quotename(c.name) + '), ' + quotename(c.name) + ' = count(distinct ' + quotename(c.name) + ') from ' + quotename(@TableName) + ';'
from sys.columns c
where object_id = object_id(@TableName)
exec sp_executesql @sql
Upvotes: -1
Reputation: 4695
This could probably be pared down a bit, or adjusted to accept a specific list of columns if you are interested only in a subset (For instance, you probably don't care about the distribution of a surrogate key, so you could exclude that. Or maybe you want to exclude float
columns), but here is a snippet I use when I want to find the distinct values in a table, and the counts thereof.
-- Field-by-Field with loop
drop table if exists #DistVals
create table #DistVals
(
TableName nvarchar(384),
OrdinalPosition int,
Field nvarchar(128),
Type nvarchar(128),
Value nvarchar(max),
Ct int
)
go
declare
@TableName nvarchar(128), -- table here
@SchemaName nvarchar(128) -- schema here
declare @sql nvarchar(max)
declare c cursor local fast_forward for
select concat('insert into #DistVals (TableName, OrdinalPosition, Field, Type, Value, Ct) select TableName = ', quotename(concat(table_catalog, '.', table_schema, '.', table_name), ''''), ', OrdinalPosition = ', ordinal_position, ', Field = ', quotename(column_name, ''''), ', Type = ', quotename(data_type, ''''), ', Value = convert(nvarchar(max), ', quotename(column_name), '), Ct = count(1) from ', quotename(table_catalog), '.', quotename(table_schema), '.', quotename(table_name), ' group by ', quotename(column_name))
from information_schema.columns
where table_name = @TableName
and table_schema = @SchemaName
-- and column_name <> 'MySurrogateKey'
open c
fetch next from c into @sql
while @@fetch_status = 0
begin
exec sp_executesql @sql
fetch next from c into @sql
end
select
TableName,
OrdinalPosition,
Field,
Type,
Value,
Ct,
TotalValueCt = sum(ct) over (partition by TableName, Field)
from #DistVals
order by TableName, OrdinalPosition, Ct desc
I find this approach the easiest to reason about, and since I'm typically running this in an ad-hoc fashion, I usually don't care that I have to re-hit the table n times for each column.
EDIT If you really want as many result sets as you have columns to analyze, you could replace the insert into #DistVals
with a simple select. I don't know your exact use case, but to me, having a single dataset containing all columns is more useful.
That said, if you want to get everything in one go, You could also do that by doing a dynamic unpivot
.
-- all at once with dynamic unpivot
declare
@TableName nvarchar(128),
@SchemaName nvarchar(128)
declare
@Columns nvarchar(max),
@ConvertedColumns nvarchar(max),
@SQL nvarchar(max)
select
@Columns = string_agg(quotename(name), ','),
@ConvertedColumns = string_agg(concat(quotename(name), ' = convert(nvarchar(max), ', quotename(name), ')'), ',')
from sys.columns
where object_id = object_id(@SchemaName + '.' + @TableName)
select @SQL = concat
('
select
Name,
ValueCt = count(1),
DistinctValueCt = count(distinct Value)
from
(
select ', @ConvertedColumns, '
from ', quotename(@SchemaName), '.', quotename(@TableName), '
) s
unpivot(Value for Name in(', @Columns, ')) u
group by name
')
exec sp_executesql @SQL
Upvotes: -1