Reputation: 21
I have an employee table with duplicate instances of employees. For instance the last name Baba may show up 2 times with the same employee ID. I have to count last names from the table, but do not want to count the same one twice.
I am writing SQL in Postgres. Here is the table from which I draw my query:
CREATE TABLE Employee (
emp_no int NOT NULL,
birth_date date NOT NULL,
first_name varchar(100) NOT NULL,
last_name varchar(100) NOT NULL,
gender varchar(100) NOT NULL,
hire_date date NOT NULL,
CONSTRAINT pk_Salaries PRIMARY KEY (
emp_no
)
);
The data was given and contained duplicates. I cannot remove the duplicates but do not want to count them. Here is my query statement:
SELECT Employee.last_name, COUNT(Employee.last_name) AS "Last Name Count"
FROM Employee
GROUP BY Employee.last_name
ORDER BY "Last Name Count" DESC;
The output works well but I am sure it is counting some last names more than once. I have tried adding a WHERE cause to get a count of last names where the emp_no is distinct but it does not work.
Upvotes: 0
Views: 1537
Reputation: 81
You want to count last names from the table, but do not count the same one twice. So try this :
"SELECT Employee.last_name, COUNT(DISTINCT Employee.last_name) AS "Last Name Count" FROM Employee GROUP BY Employee.last_name"
Upvotes: 1
Reputation: 21
Thank you all for your quick responses. They were all very good and helpful!
I ran the following code to find that I was wrong and each individual had only one instance in the table and had only one unique employee ID (emp_no).
SELECT Employee.emp_no, COUNT(Employee.emp_no) AS "Employee ID Count" FROM Employee GROUP BY Employee.emp_no ORDER BY "Employee ID Count" ASC;
Again, thank you all very much!
Upvotes: 0
Reputation: 1028
You should try validating if the first name is counted uniquely by each last name
something like this
SELECT Employee.last_name, COUNT(distinct Employee.first_name) AS "Last Name Count"
FROM Employee
GROUP BY Employee.last_name
ORDER BY "Last Name Count" DESC;
see fiddle
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=f0a9568e6cb5fb5e0247d2f2c5e95114
or if necessary check if more data is repeating in both lines, doing something like
select distinct * from (
SELECT Employee.last_name,
COUNT(*) over (partition by first_name, birth_date, last_name, gender) AS n
FROM Employee
) V
where n > 1
see the fiddle
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=223143f0d603abf30d99ad87fa07781e
Upvotes: 0
Reputation: 65363
Just use distinct
keyword during applying the COUNT()
aggregation :
SELECT e.last_name, COUNT(distinct e.last_name) AS "Last Name Count"
FROM Employee e
GROUP BY e.last_name
ORDER BY "Last Name Count" DESC;
Upvotes: 0
Reputation: 106
The emp_no is a primary key, so it has to be unique and a where clause with distinct would have no impact. The query seems to be accurate, I'd be surprised if it's counting last names more than once.
Upvotes: 0