Mask
Mask

Reputation: 573

Populate column based on row values BigQuery Standard SQL

I have a Table lets say :-

  Name   A    B    C    D    
------- ---  ---  ---  --- 
 alpha   0    1    0   0.6     
 beta   0.6   0    0   0.1
 gama    0    0    0   0.6

Now I want to populate values on Two columns(Result & Class) based on A, B, C, D values.

The condition is if value in any of the field(A,B,C,D) is >.5 then Result column should have "F" else it should have "P". Also the column whose valie is >.5 should be in Class example("A,D")

For better understanding here is the result I want:-

  Name   A    B    C    D    Result    Class
------- ---  ---  ---  ---  --------  -------
 alpha   0    1    0   0.6     F        B,D      
 beta   0.6   0    0   0.1     F         A
 gama    0    0    0   0.4     P        NULL 

I am New to BigQuery and need Help. What would be workaround.

This what I have done till yet

  SELECT *, CASE WHEN (A > .5 OR B > .5 OR C > .5 OR D >.5)
            THEN 'F'
            ELSE 'P' END AS Result AND Class....//here i am stuck
  
  FROM table1

Actually, I have no Idea how to Build this exact Script. I was able to achieve first part where I was able to Populate Result column with "F" and "P" but could not make Class to populate column names....

Upvotes: 1

Views: 1379

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

Below is for BigQuery Standard SQL

Using javaScript UDF helps in many cases but should be avoid if problem can be solved with SQL as in below example

#standardSQL
SELECT *,
  ( SELECT IF(LOGICAL_OR(val > 0.5), 'F', 'P') 
    FROM UNNEST([A,B,C,D]) val
  ) AS Result,
  ( SELECT STRING_AGG(['A','B','C','D'][OFFSET(pos)]) 
    FROM UNNEST([A,B,C,D]) val WITH OFFSET pos 
    WHERE val > 0.5
  ) AS Class
FROM `project.dataset.table`  

You can test , play with above using sample data from y our question as in below example

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'alpha' name, 0 A, 1 B, 0 C, 0.6 D UNION ALL
  SELECT 'beta', 0.6, 0, 0, 0.1 UNION ALL
  SELECT 'gamma', 0, 0, 0, 0.4 
)
SELECT *,
  ( SELECT IF(LOGICAL_OR(val > 0.5), 'F', 'P') 
    FROM UNNEST([A,B,C,D]) val
  ) AS Result,
  ( SELECT STRING_AGG(['A','B','C','D'][OFFSET(pos)]) 
    FROM UNNEST([A,B,C,D]) val WITH OFFSET pos 
    WHERE val > 0.5
  ) AS Class
FROM `project.dataset.table`    

with output as

Row name    A       B   C   D       Result  Class    
1   alpha   0.0     1   0   0.6     F       B,D  
2   beta    0.6     0   0   0.1     F       A    
3   gamma   0.0     0   0   0.4     P       null       

Upvotes: 2

Alexandre Moraes
Alexandre Moraes

Reputation: 4042

Since you are analysing each column, I assume you do not have a extensive quantity of columns. Therefore, I created a simple JavaScript User Defined Function (UDF) in order to check the row's value and return the column's name if the condition is met.

I have used the provided sample data to test the below query.

#javaScript UDF
CREATE TEMP FUNCTION class(A FLOAT64, B FLOAT64, C FLOAT64, D FLOAT64)
RETURNS String
LANGUAGE js AS """
var class_array=[];
if(A > 0.5){class_array.push("A");}
if(B > 0.5){class_array.push("B");}
if(C > 0.5){class_array.push("C");}
if(D > 0.5){class_array.push("D");} 

return class_array;
""";

#sample data
WITH data as (
 SELECT "alpha" as Name, 0 as A, 1 as B, 0 as C, 0.6 as D UNION ALL  
 SELECT "beta", 0.6, 0, 0, 0.1 UNION ALL
 SELECT "gama", 0, 0, 0, 0.4
)

Select name, A,B,C,D, 
        CASE WHEN (A > .5 OR B > .5 OR C > .5 OR D >.5) THEN "F" ELSE "P" END AS Result,
        IF(class(A,B,C,D) is null , null, class(A,B,C,D)) as Class from data

And the output,

Row name    A   B   C   D   Result  Class
1   alpha   0   1   0   0.6 F       B,D
2   beta    0.6 0   0   0.1 F       A
3   gama    0   0   0   0.4 P   

As it is shown within the UDF, each row's value is analysed and if the condition is met, the column's name is manually added to an array of strings. In addition, pay attention that the JS UDF returns a String, not an array. It automatically converts the previously created Array to String.

Lastly, I should point that is not possible to retrieve the column name within a query in this context. Although, you can retrieve it, in other scenarios, using INFORMATION_SCHEMA.

Upvotes: 2

Related Questions