Reputation: 1330
I have a sql Employee
Table which describes how much a user likes a particular metal , the table looks like this
"Employee_number" "Rank_1 "Rank_2 "Rank_3 "Rank_4 "Rank5
1 Gold null null null null
2 bronze Gold null null null
3 Gold platinum null null null
4 Gold copper null null null
5 Gold bronze platinum null null
6 Gold bronze platinum null null
7 Gold platinum Silver null null
8 Gold platinum Silver null null
9 Gold platinum business null null
10 null null null null null
11 Silver bronze business platinum Gold
The Employee_number field is a unique field ,
There is also a table which describes the general ranking of the Metals, it looks like this :
Metal Rank
Gold 1
platinum 2
silver 3
copper 4
bronze 5
What i am trying to do is that whenever there is a null value for an employee , fill in with the default metals based on their ranking
eg -> For Employee 10 : All values are null , easy , His rank_1 metal would be Gold , rank2_metal would be Platinum , rank3_metal would be Silver , rank 4_metal would be copper , rank 5_metal would be bronze
Now for employee_1 , he has a rank_1 metal already , but no other ranks are avialble , so replace rank2_metal with Platinum , rank_3 metal with silver , rank_4 metal with copper , rank_5 metal with bronze
Now for employee_2 , he has bronze as his first metal and Gold as the second metal , his rank3_metal would be Platinum , rank_4 metal would silver , rank5_metal would be copper
similary , lets take the case of employee_6 he has three ranks filled , need to fill in rank 4 and 5 , his rank_4 metals would be silver and rank_5 metal would be copper
Does anyone have any suggestion on how this could be one in sql , i am using bigquery
Upvotes: 1
Views: 1201
Reputation: 172993
Below is for BigQuery Standard SQL - hope you will adopt this for your real use case.
#standardSQL
WITH metals AS (
SELECT 'Gold' Metal, 1 RANK UNION ALL SELECT 'platinum', 2 UNION ALL
SELECT 'silver', 3 UNION ALL SELECT 'copper', 4 UNION ALL SELECT 'bronze', 5
)
SELECT Employee_number,
MAX(IF(pos=0, Metal, NULL)) Rank_1,
MAX(IF(pos=1, Metal, NULL)) Rank_2,
MAX(IF(pos=2, Metal, NULL)) Rank_3,
MAX(IF(pos=3, Metal, NULL)) Rank_4,
MAX(IF(pos=4, Metal, NULL)) Rank_5
FROM (
SELECT Employee_number,
ARRAY_CONCAT(
ARRAY(SELECT Metal FROM (
SELECT 1 a, Rank_1 Metal UNION ALL SELECT 2, Rank_2 UNION ALL
SELECT 3, Rank_3 UNION ALL SELECT 4, Rank_4 UNION ALL
SELECT 5, Rank_5 )
WHERE NOT Metal IS NULL
ORDER BY a
), ARRAY(SELECT Metal FROM metals m
WHERE NOT LOWER(Metal) IN (
SELECT x FROM UNNEST(ARRAY(
SELECT LOWER(b) FROM (
SELECT Rank_1 b UNION ALL SELECT Rank_2 UNION ALL
SELECT Rank_3 UNION ALL SELECT Rank_4 UNION ALL
SELECT Rank_5 )
WHERE NOT b IS NULL
)) x
) ORDER BY RANK
)) arr
FROM `project.dataset.employee`
), UNNEST(arr) Metal WITH OFFSET pos
GROUP BY Employee_number
ORDER BY Employee_number
You can test, play with above using dummy data from your question as below
#standardSQL
WITH `project.dataset.employee` AS (
SELECT 1 Employee_number, 'Gold' Rank_1, NULL Rank_2, NULL Rank_3, NULL Rank_4, NULL Rank_5 UNION ALL
SELECT 2, 'bronze', 'Gold', NULL, NULL, NULL UNION ALL
SELECT 3, 'Gold', 'platinum', NULL, NULL, NULL UNION ALL
SELECT 4, 'Gold', 'copper', NULL, NULL, NULL UNION ALL
SELECT 5, 'Gold', 'bronze', 'platinum', NULL, NULL UNION ALL
SELECT 6, 'Gold', 'bronze', 'platinum', NULL, NULL UNION ALL
SELECT 7, 'Gold', 'platinum', 'Silver', NULL, NULL UNION ALL
SELECT 8, 'Gold', 'platinum', 'Silver', NULL, NULL UNION ALL
SELECT 9, 'Gold', 'platinum', 'business', NULL, NULL UNION ALL
SELECT 10, NULL, NULL, NULL, NULL, NULL UNION ALL
SELECT 11, 'Silver', 'bronze', 'business', 'platinum', 'Gold'
), metals AS (
SELECT 'Gold' Metal, 1 RANK UNION ALL SELECT 'platinum', 2 UNION ALL
SELECT 'silver', 3 UNION ALL SELECT 'copper', 4 UNION ALL SELECT 'bronze', 5
)
SELECT Employee_number,
MAX(IF(pos=0, Metal, NULL)) Rank_1,
MAX(IF(pos=1, Metal, NULL)) Rank_2,
MAX(IF(pos=2, Metal, NULL)) Rank_3,
MAX(IF(pos=3, Metal, NULL)) Rank_4,
MAX(IF(pos=4, Metal, NULL)) Rank_5
FROM (
SELECT Employee_number,
ARRAY_CONCAT(
ARRAY(SELECT Metal FROM (
SELECT 1 a, Rank_1 Metal UNION ALL SELECT 2, Rank_2 UNION ALL
SELECT 3, Rank_3 UNION ALL SELECT 4, Rank_4 UNION ALL
SELECT 5, Rank_5 )
WHERE NOT Metal IS NULL
ORDER BY a
), ARRAY(SELECT Metal FROM metals m
WHERE NOT LOWER(Metal) IN (
SELECT x FROM UNNEST(ARRAY(
SELECT LOWER(b) FROM (
SELECT Rank_1 b UNION ALL SELECT Rank_2 UNION ALL
SELECT Rank_3 UNION ALL SELECT Rank_4 UNION ALL
SELECT Rank_5 )
WHERE NOT b IS NULL
)) x
) ORDER BY RANK
)) arr
FROM `project.dataset.employee`
), UNNEST(arr) Metal WITH OFFSET pos
GROUP BY Employee_number
ORDER BY Employee_number
with result
Row Employee_number Rank_1 Rank_2 Rank_3 Rank_4 Rank_5
1 1 Gold platinum silver copper bronze
2 2 bronze Gold platinum silver copper
3 3 Gold platinum silver copper bronze
4 4 Gold copper platinum silver bronze
5 5 Gold bronze platinum silver copper
6 6 Gold bronze platinum silver copper
7 7 Gold platinum Silver copper bronze
8 8 Gold platinum Silver copper bronze
9 9 Gold platinum business silver copper
10 10 Gold platinum silver copper bronze
11 11 Silver bronze business platinum Gold
Note: above solution assumes there is no mix between filled and NULL Metal, meaning three options:
1. all Rank fields filled already with Metal
2. all Rank fields are NULL
3. first 1 or more fields filled with Metal and rest are NULLs
With this said, first array is built off of filled fields; second array built off of rest of Metal fields from Metals table; then two arrays are concatenated and first 5 elements are used to recreate original table
Hope this is not too messy
P.S. above solution can relatively easy be extended to the case when NULLs and filled Metals are in mix - but looks like this is out of question's scope :o)
Upvotes: 3