kalyan4uonly
kalyan4uonly

Reputation: 335

how to return a flag value based on table data

I have a table "emp" as like follows in BIG QUERY

enter image description here

If we group by all columns or distinct records then we will get 3 as count, if we count(*) then it will give 4

I need to write a query which has to return "1", if distinct records and count(*) values are not equal (3 != 4 in our case) else "0" as flag like as below

Thanks! enter image description here

Upvotes: 0

Views: 833

Answers (3)

Adrian White
Adrian White

Reputation: 1804

This might scale a little nicer ...

  • just replace bigquery-public-DATA with your project and austin_bikeshare with your dataset.

Why nicer ?

  1. querying __ TABLE __ is not billed so free :-)
  2. Count distinct over the hash is slightly faster (would use approx_count_distinct and round if super large datasets)
  3. Your flag results in a number between 0 and 1 - showing scale of dups in each table - this can be
  4. Running over the whole dataset which I'm assuming you'll do is trivial.

#standard sql:

select 
  b.table_id Table_Name
  ,round(safe_divide(count(distinct(FARM_FINGERPRINT(FORMAT('%T', a)))) , row_count),4) dup_ratio 
  ,cast( trunc(safe_divide(count(distinct(FARM_FINGERPRINT(FORMAT('%T', a)))) , row_count)) as INT64) your_flag 
from 
  `bigquery-public-data.austin_bikeshare.*` a
inner join 
  bigquery-public-data.austin_bikeshare.__TABLES__ b 
 on a._TABLE_SUFFIX = b.table_id
group by 
  table_id  ,row_count

Result :

Table_Name           dup_ratio  your_flag
bikeshare_stations   0.0104     0
bikeshare_trips      1.0        1

Upvotes: 0

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173013

Below is for BigQuery Standard SQL

#standardSQL
SELECT IF(COUNT(*) = COUNT(DISTINCT FORMAT('%t', t)), 0, 1) AS flag
FROM `project.dataset.emp` t   

if to apply to sample data from your question - result is

Row flag     
1   1    

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269953

If you want to find duplicates, you can use:

select empno, empname, deptno
from t
group by empno, empname, deptno
having count(*) > 1;

You can then incorporate this into a query to return a flag if you like:

select exists (select 1
               from t
               group by empno, empname, deptno
               having count(*) > 1
              ) as flag

Upvotes: 0

Related Questions