Find rows with duplicate values in a column

I have a table author_data:

 author_id | author_name
 ----------+----------------
 9         | ernest jordan
 14        | k moribe
 15        | ernest jordan
 25        | william h nailon 
 79        | howard jason
 36        | k moribe

Now I need the result as:

 author_id | author_name                                                  
 ----------+----------------
 9         | ernest jordan
 15        | ernest jordan     
 14        | k moribe 
 36        | k moribe

That is, I need the author_id for the names having duplicate appearances. I have tried this statement:

select author_id,count(author_name)
from author_data
group by author_name
having count(author_name)>1

But it's not working. How can I get this?

Upvotes: 8

Answers (4)

Saurabh_g

Reputation: 1

If you want the answer you mentioned in the question, the whole query will fetch for you but if you just want the duplicate one, you can use the inner query. You can use windows functions, Row, Dense rank also, to get your answers

select a.author_id, 
a.author_name 
from authors a JOIN
   (
    select author_name
    from authors
    group by author_name
    having count(author_name) >1
   ) as temp
on a.author_name = temp.author_name

Upvotes: 0

Erwin Brandstetter

Reputation: 656331

I suggest a window function in a subquery:

SELECT author_id, author_name  -- omit the name here if you just need ids
FROM (
   SELECT author_id, author_name
        , count(*) OVER (PARTITION BY author_name) AS ct
   FROM   author_data
   ) sub
WHERE  ct > 1;

You will recognize the basic aggregate function count(). It can be turned into a window function by appending an OVER clause - just like any other aggregate function.

This way it counts rows per partition. Voilá.

It has to be done in a subquery because the result cannot be referenced in the WHERE clause in the same SELECT (happens after WHERE). See:

Best way to get result count before LIMIT was applied

In older versions without window functions (v.8.3 or older) - or generally - this alternative performs pretty fast:

SELECT author_id, author_name  -- omit name, if you just need ids
FROM   author_data a
WHERE  EXISTS (
   SELECT FROM author_data a2
   WHERE  a2.author_name = a.author_name
   AND    a2.author_id <> a.author_id
   );

If you are concerned with performance, add an index on author_name.

Upvotes: 14

coisnepe

Reputation: 532

You could join the table onto itself, which is achievable with either of the following queries:

SELECT a1.author_id, a1.author_name
FROM authors a1
CROSS JOIN authors a2
  ON a1.author_id <> a2.author_id
  AND a1.author_name = a2.author_name;

-- 9 |ernest jordan
-- 15|ernest jordan
-- 14|k moribe
-- 36|k moribe

--OR

SELECT a1.author_id, a1.author_name
FROM authors a1
INNER JOIN authors a2
  WHERE a1.author_id <> a2.author_id
  AND a1.author_name = a2.author_name;

-- 9 |ernest jordan
-- 15|ernest jordan
-- 14|k moribe
-- 36|k moribe

Upvotes: 2

SoulTrain

Reputation: 1904

You are half way there already. You need to just use the identified Author_IDs and fetch the rest of the data.

try this..

SELECT author_id, author_name
FROM author_data
WHERE author_id in (select author_id
        from author_data
        group by author_name
        having count(author_name)>1)

Upvotes: 3

Find rows with duplicate values in a column

Answers (4)

Related Questions