Cmac
Cmac

Reputation: 21

How to count last names in a table without duplicating employee ID

I have an employee table with duplicate instances of employees. For instance the last name Baba may show up 2 times with the same employee ID. I have to count last names from the table, but do not want to count the same one twice.

I am writing SQL in Postgres. Here is the table from which I draw my query:

CREATE TABLE Employee (
    emp_no int   NOT NULL,
    birth_date date   NOT NULL,
    first_name varchar(100)   NOT NULL,
    last_name varchar(100)   NOT NULL,
    gender varchar(100)   NOT NULL,
    hire_date date   NOT NULL,
    CONSTRAINT pk_Salaries PRIMARY KEY (
        emp_no
     )
);

The data was given and contained duplicates. I cannot remove the duplicates but do not want to count them. Here is my query statement:

SELECT Employee.last_name, COUNT(Employee.last_name) AS "Last Name Count"
FROM Employee
GROUP BY Employee.last_name
ORDER BY "Last Name Count" DESC;

The output works well but I am sure it is counting some last names more than once. I have tried adding a WHERE cause to get a count of last names where the emp_no is distinct but it does not work.

Upvotes: 0

Views: 1537

Answers (5)

Mehdi
Mehdi

Reputation: 81

You want to count last names from the table, but do not count the same one twice. So try this :

"SELECT Employee.last_name, COUNT(DISTINCT Employee.last_name) AS "Last Name Count" FROM Employee GROUP BY Employee.last_name"

Upvotes: 1

Cmac
Cmac

Reputation: 21

Thank you all for your quick responses. They were all very good and helpful!

I ran the following code to find that I was wrong and each individual had only one instance in the table and had only one unique employee ID (emp_no).

SELECT Employee.emp_no, COUNT(Employee.emp_no) AS "Employee ID Count" FROM Employee GROUP BY Employee.emp_no ORDER BY "Employee ID Count" ASC;

Again, thank you all very much!

Upvotes: 0

Frederic
Frederic

Reputation: 1028

You should try validating if the first name is counted uniquely by each last name

something like this

SELECT Employee.last_name, COUNT(distinct Employee.first_name) AS "Last Name Count"
FROM Employee
GROUP BY Employee.last_name
ORDER BY "Last Name Count" DESC;

see fiddle

https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=f0a9568e6cb5fb5e0247d2f2c5e95114

or if necessary check if more data is repeating in both lines, doing something like

select distinct * from (
SELECT Employee.last_name, 
COUNT(*) over (partition by first_name, birth_date, last_name, gender) AS n
FROM Employee
) V
where n > 1

see the fiddle

https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=223143f0d603abf30d99ad87fa07781e

Upvotes: 0

Barbaros Özhan
Barbaros Özhan

Reputation: 65363

Just use distinct keyword during applying the COUNT() aggregation :

SELECT e.last_name, COUNT(distinct e.last_name) AS "Last Name Count"
  FROM Employee e
 GROUP BY e.last_name
 ORDER BY "Last Name Count" DESC;

Upvotes: 0

GizmoZa
GizmoZa

Reputation: 106

The emp_no is a primary key, so it has to be unique and a where clause with distinct would have no impact. The query seems to be accurate, I'd be surprised if it's counting last names more than once.

Upvotes: 0

Related Questions