Reputation: 359
I am currently working on the employees' benefits data. However, the spreadsheet data is totally in a mess. I would like to format it as easy to capture the information. The current formatting is as below:
Relationship EmployeeName BenefitCode BenefitOption Name
Alice DEN EEC
CHL Alice DEN EEC John
SPS Alice MED Lee
Lily VIS
SPS Lily VIS Tom
I would like to transfer it like this:
Relationship Name MED DEN VIS
Employee Alice EEC
CHL John EEC
SPS Lee MED
Employee Lily VIS
SPS Tom VIS
I tried grouping the data by names and BenefitCode, but I just got so confused about it.
My codes are as below:
SELECT RelationshipCode, EmployeeName,
MAX(IF(BenefitCode = "DEN", BenefitOptionCode , NULL)) AS DEN,
MAX(IF(BenefitCode = "MED", BenefitOptionCode , NULL)) AS MEDICAL,
MAX(IF(BenefitCode = "VIS", BenefitOptionCode , NULL)) AS VISION
FROM `TableXXX`
WHERE RelationshipCode = 'Employee'
GROUP BY EmployeeName, RelationshipCode
But it seems like not a good idea to lose the dependents' relationship to the employee. Can anyone tell me how to transfer the vertical data into horizontal? Or do you have any good idea to solve it?
Upvotes: 0
Views: 1063
Reputation: 173210
Below is for BigQuery Standard SQL
#standardSQL
SELECT
EmployeeName,
IF(Relationship IS NULL, 'Self', Relationship) Relationship,
IFNULL(Name, EmployeeName) Name,
MAX(IF(BenefitCode = 'DEN', IFNULL(BenefitOption, BenefitCode), NULL)) AS DEN,
MAX(IF(BenefitCode = 'MED', IFNULL(BenefitOption, BenefitCode), NULL)) AS MEDICAL,
MAX(IF(BenefitCode = 'VIS', IFNULL(BenefitOption, BenefitCode), NULL)) AS VISION
FROM `project.dataset.table`
GROUP BY Name, EmployeeName, Relationship
-- ORDER BY Name, Relationship
If to apply to sample data from your question - result is
Row EmployeeName Relationship Name DEN MEDICAL VISION
1 Alice Self Alice EEC null null
2 Alice CHL John EEC null null
3 Alice SPS Lee null MED null
4 Lily Self Lily null null VIS
5 Lily SPS Tom null null VIS
Another option would be extend above flatten version into "hierarchical"
#standardSQL
SELECT EmployeeName,
ARRAY_AGG(STRUCT(Name, Relationship, DEN, MEDICAL, VISION)) benefits
FROM (
SELECT
EmployeeName,
IF(Relationship IS NULL, 'Self', Relationship) Relationship,
IFNULL(Name, EmployeeName) Name,
MAX(IF(BenefitCode = 'DEN', IFNULL(BenefitOption, BenefitCode), NULL)) AS DEN,
MAX(IF(BenefitCode = 'MED', IFNULL(BenefitOption, BenefitCode), NULL)) AS MEDICAL,
MAX(IF(BenefitCode = 'VIS', IFNULL(BenefitOption, BenefitCode), NULL)) AS VISION
FROM `project.dataset.table`
GROUP BY Name, EmployeeName, Relationship
)
GROUP BY EmployeeName
-- ORDER BY EmployeeName
In this case, result will be
Row EmployeeName benefits.Name benefits.Relationship benefits.DEN benefits.MEDICAL benefits.VISION
1 Alice Alice Self EEC null null
John CHL EEC null null
Lee SPS null MED null
2 Lily Lily Self null null VIS
Tom SPS null null VIS
Upvotes: 2
Reputation: 3638
I would probably organize this into CTEs, making each column (or concept) it's own logical CTE.
with people as (
select distinct EmployeeName as person from <dataset>.<table> union distinct
select distinct Name as person from <dataset>.table
),
med as (
-- select people with MED columns
),
den as (
-- select people with DEN columns
),
... (etc)
joined as (
select * from people
left join med using(person)
left join den using(person)
)
select * from joined
My general advice for this type of scenario is to start with what you know (like how I started with MED and DEN). After these easier items are complete, you move onto items that are more complicated or require assumptions. Breaking them up into CTE blocks helps encapsulate each idea.
We also obviously don't know your data or if this is even a real-world task, but you might have some caveats that would require more detailed logic (people with same names, multi-generational relationships, etc)
Upvotes: 0