Pravin Perumal
Pravin Perumal

Reputation: 21

How to get the most occurring values in each column of a table

I need some help with t-sql and I am new to SQL scripting.

I have a table with little over 70 columns and it is a datawarehouse. Some of values from the feeds are unknown values. For ex: if a date value is unknown, it will be 19000101 and emailkey = -1, while the default value might be NULL.

I need to write a script that retrieves the following information

Column 1 = Name of the column [This can be hard-coded or retrieved from a temp table]

Column 2 = the value that has the highest frequency of occurrence. [It can be NULL, a default value or some other valid value]. It will be the result of the following query: SELECT top 1 HotelKey AS countvalue from dbo.factBooking GROUP by HotelKey ORDER by count(*) desc

Column 3 = The count of the most frequently occurring value]. It will be the result of the following query: SELECT top 1 count() AS countvalue from dbo.factBooking GROUP by HotelKey ORDER by count() desc

Column 4 = This should show the percentage of column 3 against the total count]. It will be the result of the following query: (SELECT top 1 count()from dbo.factBooking GROUP by HotelKey ORDER by count() desc)/( SELECT count(*)from dbo.factBooking)

I want the query to compute the above value for all the columns in the table. Please let me know if I am not clear or if you need more information.

Upvotes: 1

Views: 440

Answers (1)

Pravin Perumal
Pravin Perumal

Reputation: 21

I used the following query to generate the data and it works fine:

use [Database Name]

DROP TABLE #tempcol

DROP TABLE #tempbigquery

CREATE table #tempbigquery (sqlquery varchar(max));

select ColID,Name As ColName into #tempcol from syscolumns where id=object_id('factFlights') ORDER BY colid ASC declare @Counter integer = 1 , @ColCount Integer = (select count() from #tempcol),@RowCount decimal(18,2)= (Select count() from dbo.factFlights) Declare @colName Varchar(100),@sqlquery NVarchar(max) = '' While @Counter < @ColCount+3 Begin

   Set @colName  =  (Select ColName from #tempcol where ColID = @Counter)
      if @colName IS not NULL
   Set @sqlquery  =  ISNULL('SELECT ColumnName,ColumnKey,CountValue,[Percent] FROM ( SELECT TOP 1 ' + Char(39) + @colName + char(39) + ' ColumnName,' + 'CAST(' + @colName + ' AS varchar(max))' + ' As ColumnKey,COUNT(*) AS CountValue,(COUNT(*) * 100.00) / ' + Cast( @RowCount   As Nvarchar(100)) + '  As [Percent] from dbo.factFlights GROUP by ' +  @colName + ' ORDER by count(*) desc ) As  ' + @colName + '_info  UNION  ' +   char(13),'')
      --Print @sqlquery
   Set @Counter  = @Counter + 1
      INSERT into #tempbigquery (sqlquery) VALUES (@sqlquery)

End

SELECT * FROM #tempbigquery

Upvotes: 1

Related Questions