Sandy
Sandy

Reputation: 359

How to transpose the vertical data into horizontal by using BigQuery?

I am currently working on the employees' benefits data. However, the spreadsheet data is totally in a mess. I would like to format it as easy to capture the information. The current formatting is as below:

Relationship EmployeeName  BenefitCode  BenefitOption  Name  
               Alice          DEN         EEC           
  CHL          Alice          DEN         EEC          John
  SPS          Alice          MED                      Lee
               Lily           VIS                      
  SPS          Lily           VIS                       Tom

I would like to transfer it like this:

Relationship    Name     MED    DEN    VIS 
Employee        Alice           EEC
CHL             John            EEC
SPS             Lee      MED
Employee        Lily                   VIS
SPS             Tom                    VIS

I tried grouping the data by names and BenefitCode, but I just got so confused about it.

My codes are as below:

SELECT   RelationshipCode, EmployeeName, 
         MAX(IF(BenefitCode = "DEN", BenefitOptionCode , NULL)) AS DEN,
         MAX(IF(BenefitCode = "MED", BenefitOptionCode , NULL)) AS MEDICAL,
         MAX(IF(BenefitCode = "VIS", BenefitOptionCode , NULL)) AS VISION
FROM `TableXXX` 
WHERE RelationshipCode = 'Employee'
GROUP BY EmployeeName, RelationshipCode

But it seems like not a good idea to lose the dependents' relationship to the employee. Can anyone tell me how to transfer the vertical data into horizontal? Or do you have any good idea to solve it?

Upvotes: 0

Views: 1063

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173210

Below is for BigQuery Standard SQL

#standardSQL
SELECT 
  EmployeeName,
  IF(Relationship IS NULL, 'Self', Relationship) Relationship, 
  IFNULL(Name, EmployeeName) Name, 
  MAX(IF(BenefitCode = 'DEN', IFNULL(BenefitOption, BenefitCode), NULL)) AS DEN,
  MAX(IF(BenefitCode = 'MED', IFNULL(BenefitOption, BenefitCode), NULL)) AS MEDICAL,
  MAX(IF(BenefitCode = 'VIS', IFNULL(BenefitOption, BenefitCode), NULL)) AS VISION  
FROM `project.dataset.table`
GROUP BY Name, EmployeeName, Relationship 
-- ORDER BY Name, Relationship

If to apply to sample data from your question - result is

Row EmployeeName    Relationship    Name    DEN     MEDICAL VISION   
1   Alice           Self            Alice   EEC     null    null     
2   Alice           CHL             John    EEC     null    null     
3   Alice           SPS             Lee     null    MED     null     
4   Lily            Self            Lily    null    null    VIS  
5   Lily            SPS             Tom     null    null    VIS    

Another option would be extend above flatten version into "hierarchical"

#standardSQL
SELECT EmployeeName,
  ARRAY_AGG(STRUCT(Name, Relationship, DEN, MEDICAL, VISION)) benefits
FROM (
  SELECT 
    EmployeeName,
    IF(Relationship IS NULL, 'Self', Relationship) Relationship, 
    IFNULL(Name, EmployeeName) Name, 
    MAX(IF(BenefitCode = 'DEN', IFNULL(BenefitOption, BenefitCode), NULL)) AS DEN,
    MAX(IF(BenefitCode = 'MED', IFNULL(BenefitOption, BenefitCode), NULL)) AS MEDICAL,
    MAX(IF(BenefitCode = 'VIS', IFNULL(BenefitOption, BenefitCode), NULL)) AS VISION  
  FROM `project.dataset.table`
  GROUP BY Name, EmployeeName, Relationship 
) 
GROUP BY EmployeeName
-- ORDER BY EmployeeName

In this case, result will be

Row EmployeeName    benefits.Name   benefits.Relationship   benefits.DEN    benefits.MEDICAL    benefits.VISION  
1   Alice           Alice           Self                    EEC             null                null     
                    John            CHL                     EEC             null                null     
                    Lee             SPS                     null            MED                 null       
2   Lily            Lily            Self                    null            null                VIS  
                    Tom             SPS                     null            null                VIS  

Upvotes: 2

rtenha
rtenha

Reputation: 3638

I would probably organize this into CTEs, making each column (or concept) it's own logical CTE.

with people as (
  select distinct EmployeeName as person from <dataset>.<table> union distinct
  select distinct Name as person from <dataset>.table
),
med as (
  -- select people with MED columns
),
den as (
  -- select people with DEN columns
),
... (etc)
joined as (
  select * from people
  left join med using(person)
  left join den using(person)
)
select * from joined

My general advice for this type of scenario is to start with what you know (like how I started with MED and DEN). After these easier items are complete, you move onto items that are more complicated or require assumptions. Breaking them up into CTE blocks helps encapsulate each idea.

We also obviously don't know your data or if this is even a real-world task, but you might have some caveats that would require more detailed logic (people with same names, multi-generational relationships, etc)

Upvotes: 0

Related Questions