The Great
The Great

Reputation: 7733

How to create new binary column by checking across all rows of a subject?

I have a table like as shown below

subject_id  Desc  Name   class   BC  FU  PA  VI 
1                Fung     FU      0   1   0  0  
1                Para     PA      0   0   1  0  
1                Viru     VI      0   0   0  1  
1                Para     PA      0   0   1  0  
1             T5 Bacte    BC      1   0   0  0  
1             T6 Bacte    BC      1   0   0  0 
2             T5 Bacte    BC      1   0   0  0
2                Fung     FU      1   0   0  0

What I would like to do is create a new column "BC_FU" which will have 1 if the subject has 1 in BC column and FU column. But it shouldn't be looking for in the same row but across all the records of the same subject.

The same logic applies for BC_VI column (another new column) as well

For ex: If you look at subject_id = 1, we can see that he has 1 in BC column at row 5 and 6 whereas he has FU = 1 at row 1. Now we know that subject_id has both BC and FU in this records (from database)

This is what I tried but doesn't help. I am trying to write in BigQuery, so your help to correct or fix this in BigQuery will be helpful

select *, 
CASE WHEN (MAX(BC) == 1 AND MAX(FU) == 1) THEN 1
ELSE 0 END AS BC_FU,
CASE WHEN (MAX(BC) == 1 AND MAX(VI) == 1) THEN 1
ELSE 0 END AS BC_VI,
FROM TABLE T
GROUP BY SUBJECT_ID

So, I would like to create an output whick looks like below

subject_id  Desc  Name  class  BC FU PA VI BC_FU BC_VI
1                 Fungi  FU     0  1  0 0   1     1
1                 Para   PA     0  0  1 0   1     1
1                 Virus  VI     0  0  0 1   1     1 
1                 Para   PA     0  0  1 0   1     1 
1             T5  Bacte  BC     1  0  0 0   1     1
1             T6  Bacte  BC     1  0  0 0   1     1
2             T5  Bacte  BC     1  0  0 0   1     1
2                 Virus  VI     0  1  0 1   1     1

Upvotes: 1

Views: 554

Answers (4)

Yosher
Yosher

Reputation: 805

I've read your question and let me add my solution as well.

SELECT  t1.*,
        CASE WHEN (MAX(BC) OVER (PARTITION BY subject_id) +
                 MAX(FU) OVER (PARTITION BY subject_id) ) == 2
             THEN 1
             ELSE 0
             END AS BC_FU
        CASE WHEN (MAX(BC) OVER(PARTITION BY subject_id) +
                  MAX(VI) OVER(PARTITION BY subject_id) ) == 2
             THEN 1
             ELSE 0
             END AS BC_VI
FROM    table as t1
ORDER BY subject_id

Here are some tips which I thought might be useful for you:)

  • OVER () is sometimes referred to as window function. If you type SELECT (aggregating function) OVER(PARTITION BY columnA) you can use the aggregation function but not turning the result into single row at the same time. (Please ignore this if you already know)

  • As Mr. Tim already mentioned, it's good to remove unnecessary sub-query (a query inside the other) to increase the readability.

  • Be sure to add 'Else 0' in every CASE sentence so as not to make the possibility of causing 'NULL'.

Here, I chose to sum up the maximum value of BC and FU to check if it's 2 or not, rather than not checking 'intersection' (e.g. max(BC) ==1 AND max(FU) ==1) like Mr.Tim's post.
That is because I thought you would probably add such column as 'BC_FU_VI' in the future,
when '2' would increase readability that this case sentence is actually trying to convert two columns into a single column.

Thank you.

Upvotes: 1

George Joseph
George Joseph

Reputation: 5932

You can do the following

select t1.*
       ,max(BC) over(partition by subject_id)
        *max(FU) over(partition by subject_id) as BC_FU
       ,max(BC) over(partition by subject_id)
        *max(VI) over(partition by subject_id) as BC_VI
  from your_table t1

Upvotes: 2

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522074

If I understand your requirement correctly, you should just be able to use analytic functions here:

SELECT *,
    CASE WHEN MAX(BC) OVER (PARTITION BY subject_id) +
              MAX(FU) OVER (PARTITION BY subject_id) = 2 THEN 1 ELSE 0 END AS BC_FU,
    CASE WHEN MAX(BC) OVER (PARTITION BY subject_id) +
              MAX(VI) OVER (PARTITION BY subject_id) = 2 THEN 1 ELSE 0 END AS BC_VI
FROM yourTable
ORDER BY subject_id;

This answer avoids the need for an unnecessary subquery.

Upvotes: 1

juergen d
juergen d

Reputation: 204854

select t1.*, tmp.BC_FU, tmp.BC_VI
from your_table t1
join
(
  select subject_id, 
         CASE WHEN MAX(BC) + MAX(FU) = 2 THEN 1 ELSE 0 END AS BC_FU,
         CASE WHEN MAX(BC) + MAX(VI) = 2 THEN 1 ELSE 0 END AS BC_VI
  from your_table
  group by subject_id
) tmp on t1.subject_id = tmp.subject_id

Upvotes: 1

Related Questions