SELECT de-normalized columns into separate records?

Question

I am playing around with SQL a little just so I am not completely ignorant about it if I am ever asked in a job interview. My friend was recently asked the following question at an interview and he couldn't get it and I asked somebody at work who knows SQL decently and he didn't know. Can you guys answer this problem for me and then explain how it works? Please?

*The problem*

Database normalization (or lack of normalization) often presents a challenge for developers.

Consider a database table of employees that contains three fields:

EmployeeID
EmployeeName
EmailAddresses

Every employee, identified by a unique EmployeeID, may have one or more comma-separated, @rockauto.com email address(es) in the EmailAddresses field.

The database table is defined below:

CREATE TABLE Employees
(
  EmployeeID int UNSIGNED NOT NULL PRIMARY KEY,
  EmployeeName varchar(50) NOT NULL,
  EmailAddresses varchar(40) NOT NULL ,
  PRIMARY KEY(EmployeeID)
);

For testing purposes, here is some sample data:

INSERT INTO Employees (EmployeeID, EmployeeName, EmailAddresses) VALUES
('1', 'Bill', 'bill@companyx.com'),
('2', 'Fred', 'fred@companyx.com,freddie@companyx.com'),
('3', 'Fred', 'fredsmith@companyx.com'),
('4', 'Joe', 'joe@companyx.com,joe_smith@companyx.com');

Your task is to write a single MySQL SELECT query that will show the following output for the sample data above:

Employee    EmailAddress
Bill    bill@companyx.com
Fred (2)    fred@companyx.com
Fred (2)    freddie@companyx.com
Fred (3)    fredsmith@companyx.com
Joe     joe@companyx.com
Joe     joe_smith@companyx.com

Please take note that because there is more than one person with the same name (in this case, "Fred"), the EmployeeID is included in parenthesis.

Your query is required to written in MySQL version 5.1.41 compatible syntax. You should assume that the ordering is accomplished using standard database ascending ordering: "ORDER BY EmployeeID ASC"

For this problem, you need to submit a single SQL SELECT query. Your query should be able to process a table of 1000 records in a reasonable amount of time.

Toni · Accepted Answer

only if you have less than 10000 emails.... is that acceptable?

select 
       if(t1.c > 1, concat(e.employeename, ' (', e.employeeid, ')'), e.employeename) as Employee,
       replace(substring(substring_index(e.EmailAddresses, ',', n.row), length(substring_index(e.EmailAddresses, ',', n.row - 1)) + 1), ',', '') EmailAddress 
from 
       (select employeename, count(*) as c from Employees group by employeename) as t1, 
       (select EmployeeID, length(EmailAddresses) - length(replace(EmailAddresses,',','')) + 1 as emails from Employees) as t2,
       (SELECT @row := @row + 1 as row FROM 
       (select 0 union all select 1 union all select 3 union all select 4 union all select 5 union all select 6 union all select 6 union all select 7 union all select 8 union all select 9) x,
       (select 0 union all select 1 union all select 3 union all select 4 union all select 5 union all select 6 union all select 6 union all select 7 union all select 8 union all select 9) x2, 
       (select 0 union all select 1 union all select 3 union all select 4 union all select 5 union all select 6 union all select 6 union all select 7 union all select 8 union all select 9) x3, 
       (select 0 union all select 1 union all select 3 union all select 4 union all select 5 union all select 6 union all select 6 union all select 7 union all select 8 union all select 9) x4, 
       (SELECT @row:=0) as ff) as n,
       Employees e
where 
      e.employeename = t1.employeename and
      e.employeeid = t2.employeeid and
      n.row <= t2.emails
order by e.employeeid;

EDIT:

With less useless numbers generated:

select 
       if(t1.c > 1, concat(e.EmployeeName, ' (', e.EmployeeID, ')'), e.EmployeeName) as Employee,
       replace(substring(substring_index(e.EmailAddresses, ',', n.row), length(substring_index(e.EmailAddresses, ',', n.row - 1)) + 1), ',', '') as EmailAddress 
from 
       (select EmployeeName, count(*) as c from Employees group by EmployeeName) as t1, 
       (select EmployeeID, length(EmailAddresses) - length(replace(EmailAddresses,',','')) + 1 as emails from Employees) as t2,
       (select `1` as row from (select 1 union all select 2 union all select 3 union all select 4) x) as n,
       Employees e
where 
      e.EmployeeName = t1.EmployeeName and
      e.EmployeeID = t2.EmployeeID and
      n.row <= t2.emails
order by e.EmployeeID;

And what did we learn? Poor database design results awful queries. And you can do stuff with SQL, that are probably supported only because people do poor database designs... :)

SELECT de-normalized columns into separate records?

Answers (1)

Related Questions