John Smith
John Smith

Reputation: 71

SQL - Snowflake Minus Operator

Hi I am running a query to check for any changes in a table between two dates....

SELECT * FROM TABLE_A where run_time = current_date() 
MINUS
SELECT * FROM TABLE_A where run_time = current_date()-1 

The first select statement (where run_time = current_date() return 3,357,210 records. The second select statement (where run_time = current_date()-1 returns 0 records.

Using the MINUS operator, I was expecting to see 3,357,210 records (3,357,210 - 0) but instead I get 2,026,434

Any thoughts on why? Thanks

Upvotes: 1

Views: 3919

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269953

This query:

SELECT * FROM TABLE_A where run_time = current_date() 
MINUS
SELECT * FROM TABLE_A where run_time = current_date()-1 

Will always return all unique rows from Table_A. Why? Because run_time is one of the columns and it is different in the two queries. MINUS looks at all the columns. Note that this is true even if the second query returns rows, because the values on the rows are different.

If your total is different from the total number of rows, then you have duplicates in the table.

Here are two ways to get the new records. Let me assume that identical records are identified by col1/col2:

select col1, col2
from table_a
where run_time in (current_date(), current_date() -1)
group by col1, col2
having min(run_time) = current_date();

That is, the first occurrence is the current date.

Or:

select col1, col2
from table_a a
where a.run_time = current_date() and
      not exists (select 1
                  from table_a a2
                  where a2.run_time = current_date() - 1 and
                        a2.col1 = a.col1 and a2.col2 = a.col2
                 );

Upvotes: 0

MatBailie
MatBailie

Reputation: 86735

https://docs.snowflake.com/en/sql-reference/operators-query.html#minus-except

Removes rows from one query’s result set which appear in another query’s result set, with duplicate elimination.

Thus, you only have 2,026,434 unique values in your first query. The missing million-and-a-bit are the duplicates, which have been eliminated.

Upvotes: 2

Related Questions