Find all columns that return unique rows from table in snowflake

Is there any way to achieve this?

I am working on snowflake tables and for most of them primary key is not defined in ddl. And also since the primary key is not enforced in snowflake, it has duplicate records.

Is there any way to get all the columns name that fetches unique records.

My tables has more than 30 columns and to identify the columns that returns unique rows is a troublesome task as I have to check by distinct column name and then matching it total number of rows present in table.

For eg. Total number of rows count(*) = 50

It has col - 1,2,3,4,5,6,7,8,9

Distinct of col 1 returns 20 rows
Distinct of col1, col2 returns 30
Distinct of col1, col2, col7 returns 50 rows

I have to manually take the columns name and then match it with total records.

Is there any way I can get the column name without checking each column or column combination one by one?

It would be great help if someone can guide me on this.

Upvotes: 0

Answers (2)

Sujan S Ghosh

Reputation: 277

I guess your requirement is to get unique rows from the source table which has duplicate rows (no primary keys), if yes, you can use a merge statement.

Uniqueness of rows can be achieved using a Merge statement It's advised to use MERGE to process each record differently based on the DML kind https://docs.snowflake.com/en/sql-reference/sql/merge.html#merge

Example: Here we are assuming lets say if you have any duplicate data in the source table, we will be using a merge statement with a temporary table and the merge will be on the hash key for each row between target table and source table and if there is no match, then only it will insert the row into target table.

DROP TABLE IF EXISTS tgt_tb;

create or replace temp table tgt_tb (id int, name string);

DROP TABLE IF EXISTS src_tb; create temp table src_tb (id int, name string);

insert into src_tb select 1,'a' union all select 1,'a' union all select 2,'b' union all select 3,'c';

insert into tgt_tb select 3,'c';

DROP TABLE IF EXISTS src_stg_tb; create temp table src_stg_tb as select * from

(select , ROW_NUMBER() OVER(PARTITION BY hash() order by hash() ) as rnm, hash() as hashkey from src_tb ) A where A.rnm=1;

merge into tgt_tb TGT using src_stg_tb SRC on hash(tgt.id, tgt.name) = SRC.hashkey when not matched then insert Values (src.id,src.name)

--select * from tgt_tb; With MERGE you can specify “WHEN MATCHED” and “WHEN NOT MATCHED”. For just inserting new data, we use a MERGE statement with JUST a “WHEN NOT MATCHED” clause.

Regards, Sujan

Upvotes: 0

Nathan Griffiths

Reputation: 12756

There's no simple way I can think of, but you could start running a query like this one:

SELECT column1, column2, count(*)
FROM mytable
GROUP BY 1,2
HAVING COUNT(*) > 1

If this returns results, add another column:

SELECT column1, column2, column3, count(*)
FROM mytable
GROUP BY 1,2,3
HAVING COUNT(*) > 1

Repeat until the query returns zero results, that means you now have a unique combination.

However, note that this does NOT guarantee that combination will remain unique over time, nor that this is the only unique combination of columns in the data.You will need a deeper understanding of the data and it's source to find a truly unique key, if one exists.

Upvotes: 1

Find all columns that return unique rows from table in snowflake

Answers (2)

Related Questions