Using self join to find duplicates in SQL

Question

I know that there are other questions like this. However, my question is about why the query that I am using is not returning the optimal results. Below is the query. To give context, I have a single table that has 113 columns/fields. However, only 4 really matter; acct, year, qtr, cnty (county). This table is a list of employers by establishment. An employer can appear more than once. The same person owning 12 starbucks being the best example. What I am looking for is a query that will show when acct values have different cnty values. The below query works without error but it shows far too much. It shows rows where the acct value is the same but the cnty value is the same as well. Any thoughts on looking at this query as to why it shows too much?

  select distinct t1.acct, t1.year, t1.qtr, t1.cnty
  from dbo.table t1 join dbo.table t2 on t1.acct=t2.acct
  where (t1.cnty <> t2.cnty)
  order by t1.acct, t1.year, t1.qtr, t1.cnty

Intended result

  acct                                               year        qtr         cnty
  1234567777                                         2007        4           7
  1234567777                                         2008        1           9
  1234567890                                         2006        4           31
  1234567890                                         2007        1           3
  2345678901                                         2006        4           7
  2345678901                                         2007        2           1

Gordon Linoff · Accepted Answer

Is this what you want?

  select distinct t.acct, t.year, t.qtr, t.cnty
  from (select t.*, min(cnty) over (partition by acct, year, qtr) as min_cnty,
               max(cnty) over (partition by acct, year, qtr) as max_cnty
        from dbo.table t
       ) t
  where min_cnty <> max_cnty;

Using self join to find duplicates in SQL

Answers (1)

Related Questions