Mike S
Mike S

Reputation: 549

Best way to analyze SQL Server table with a huge amount of columns?

I have a SQL Server table with 900+ columns and hundreds of millions of rows. I built a stored procedure that grabs a small subset of those rows (typically under 500k rows at a time) and looks at every column to determine to get a count of every value along with a percentage of occurrences of that value in that column. I do this by first narrowing down the data and inserting into a temp table. I then run the analysis on the temp table.

Recently we have needed to do this analysis on a much larger group of data (50 - 100 million rows of data) at a time. Unfortunately when I try to run this, it seems to run forever. I suspect that inserting that much data into a temp table is part of the problem, but then also grouping on all the values in every column has to take a long time.

Is there an easier / more efficient way of doing this? Using SQL Server 2012.

Upvotes: 1

Views: 1452

Answers (1)

Dmitrij Kultasev
Dmitrij Kultasev

Reputation: 5745

You might face the problem when you have not enough memory so it starts using swap. You can monitor HDD usage and see if it is the reason (use ctrl+shift+esc for that). If it is the case, then you can do the LOOP. Something like

DECLARE @max_id INT = (SELECT MAX(ID) FROM your_table),
    @checpoint INT = 500000,
    @current_id INT = 0,
    @step INT = 0;
WHILE (@current_id < @max_id)
BEGIN
   INSERT INTO #tmp   
   SELECT *
   FROM your_table
   WHERE id > @current_id and id < @current_id + @step * @checkpoint;
-- your analysis here
   SET @step += 1;
   SET @current_id = @current_id + @step * @checkpoint
END

I wrote it without accessing database, so there might by syntax or logical errors, but I think that the main idea is quite clear

Upvotes: 1

Related Questions