Reputation: 1771
I have the following data set (sample):
emplid | Citizeship |
100001 | USA |
100001 | CAN |
100001 | CHN |
100002 | USA |
100002 | CHN |
100003 | USA |
Is there a way to transform the data into the following:
emplid | Citizeship_1 | Citizenship_2 | Citizenship_3
100001 | USA | CHN | CAN
100002 | USA | CHN |
100003 | USA | |
The assumption is that each emplid
will have up to 4 citizenships.
I started with the following codes, but for the emplid
s who just have 1 citizenship
, the value is being repeated in the citizenship_2
, citizenship_3
, which should be just blank:
select *
, substring_index(Citizenship_multiple, ',', 1) as Citizenship_1
, substring_index(substring_index(Citizenship_multiple,',',-1),',',1) as Citizenship_2
, substring_index(substring_index(Citizenship_multiple,',',-2),',',1) as Citizenship_3
, substring_index(substring_index(Citizenship_multiple,',',-3),',',1) as Citizenship_4
from
(select *
, group_concat(distinct Citizenship) as Citizenship_multiple
from `citizenship_csv_meta`
group by emplid) a
Upvotes: 1
Views: 260
Reputation: 16720
I know you stated hardcoding was a pain, and likely not the best solution, but I was able to do this while using only one assumption: that an employee can have at most 4 citizenships. So, I just joined your table together 4 times. I had to use an outer join, because not every employee would have 4 citizenships. Here is the code, and I will explain what I did:
SELECT e.emplid, MAX(e.citizenship) AS citizenship1,
MAX(e1.citizenship) AS citizenship2,
MAX(e2.citizenship) AS citizenship3,
MAX(e3.citizenship) AS citizenship4
FROM employee e
LEFT JOIN employee e1 ON e1.emplid = e.emplid AND e1.citizenship < e.citizenship
LEFT JOIN employee e2 ON e2.emplid = e1.emplid AND e2.citizenship < e1.citizenship
LEFT JOIN employee e3 ON e3.emplid = e2.emplid AND e3.citizenship < e2.citizenship
GROUP BY e.emplid
I joined your table together 4 times, and took the MAX() citizenship from each group. The reason this works is because in the join condition i used e1.citizenship < e.citizenship
to make sure that the previous values weren't included. For example, table e2 never included USA
, so I was able to use the max function again.
What this will do is that once an employee no longer has a citizenship, the cell in the remaining columns is null, so you will need to be aware of that.
This tested beautifully on SQL Fiddle, and I actually referenced this question to figure out how to get the succeeding citizenships. Of course, I used a method slightly different from theres, but I want to give credit where credit is due.
EDIT
If you want the null cells replaced with a blank value, refer to this SQL Fiddle.
Upvotes: 1
Reputation: 13425
you can do it with case and max
SELECT emplid,
max(case when Citizeship = 'USA' then 'USA' else '' end) as Citizeship_1,
max(case when Citizeship = 'CHN' then 'CHN' else '' end) as Citizeship_2,
max(case when Citizeship = 'CAN' then 'CAN' else '' end) as Citizeship_3
FROM citizenship_csv_meta
GROUP BY emplid
Upvotes: 1