PMa
PMa

Reputation: 1771

MySQL - Reshape Data

I have the following data set (sample):

emplid | Citizeship |
100001 | USA        |
100001 | CAN        |
100001 | CHN        |
100002 | USA        |
100002 | CHN        |
100003 | USA        |

Is there a way to transform the data into the following:

emplid | Citizeship_1 | Citizenship_2 | Citizenship_3
100001 | USA          | CHN           | CAN
100002 | USA          | CHN           |
100003 | USA          |               |

The assumption is that each emplid will have up to 4 citizenships.

I started with the following codes, but for the emplids who just have 1 citizenship, the value is being repeated in the citizenship_2, citizenship_3, which should be just blank:

 select *
 , substring_index(Citizenship_multiple, ',', 1) as Citizenship_1
 , substring_index(substring_index(Citizenship_multiple,',',-1),',',1) as Citizenship_2
 , substring_index(substring_index(Citizenship_multiple,',',-2),',',1) as Citizenship_3
 , substring_index(substring_index(Citizenship_multiple,',',-3),',',1) as Citizenship_4 
 from
     (select * 
      , group_concat(distinct Citizenship) as Citizenship_multiple
      from `citizenship_csv_meta` 
       group by emplid) a

Upvotes: 1

Views: 260

Answers (2)

AdamMc331
AdamMc331

Reputation: 16720

I know you stated hardcoding was a pain, and likely not the best solution, but I was able to do this while using only one assumption: that an employee can have at most 4 citizenships. So, I just joined your table together 4 times. I had to use an outer join, because not every employee would have 4 citizenships. Here is the code, and I will explain what I did:

SELECT e.emplid, MAX(e.citizenship) AS citizenship1, 
                 MAX(e1.citizenship) AS citizenship2, 
                 MAX(e2.citizenship) AS citizenship3, 
                 MAX(e3.citizenship) AS citizenship4
FROM employee e
LEFT JOIN employee e1 ON e1.emplid = e.emplid AND e1.citizenship < e.citizenship
LEFT JOIN employee e2 ON e2.emplid = e1.emplid AND e2.citizenship < e1.citizenship
LEFT JOIN employee e3 ON e3.emplid = e2.emplid AND e3.citizenship < e2.citizenship
GROUP BY e.emplid

I joined your table together 4 times, and took the MAX() citizenship from each group. The reason this works is because in the join condition i used e1.citizenship < e.citizenship to make sure that the previous values weren't included. For example, table e2 never included USA, so I was able to use the max function again.

What this will do is that once an employee no longer has a citizenship, the cell in the remaining columns is null, so you will need to be aware of that.

This tested beautifully on SQL Fiddle, and I actually referenced this question to figure out how to get the succeeding citizenships. Of course, I used a method slightly different from theres, but I want to give credit where credit is due.

EDIT

If you want the null cells replaced with a blank value, refer to this SQL Fiddle.

Upvotes: 1

radar
radar

Reputation: 13425

you can do it with case and max

SELECT emplid,
       max(case when Citizeship = 'USA' then 'USA' else '' end) as Citizeship_1,
       max(case when Citizeship = 'CHN' then 'CHN' else '' end) as Citizeship_2,
       max(case when Citizeship = 'CAN' then 'CAN' else '' end) as Citizeship_3
FROM citizenship_csv_meta
GROUP BY emplid

Upvotes: 1

Related Questions