Savita Mhetar
Savita Mhetar

Reputation: 1

Google Bigquery use of substr, never returns back results

I have a table which has two sets of data, one set of data has information like

Type | Name | Id

PackagedDrug |Pseudoephedrine HCl Oral Tablet 120 MG| 110

PackagedDrug |Pseudoephedrine HCl Oral Tablet 60 MG|111

DrugName| Pseudoephedrine HCl| 112


What I want to do is join PackagedDrug with DrugName concepts, so get all Ids for Type PackagedDrug whose Name is matching with Name for Type DrugName. If I hardcode the Name for DrugName in the following query, it runs instantenously, but if I take out the hardcoding then it just keeps on running. Could you please suggest me suitable ways to speed up the big query?

SELECT a.MSC_ID MSC_id, a.MSC_CONcept_type, a.concept_id, a.concept_name , b.concept_name

from 
(select MSC_id, MSC_CONcept_type, concept_id, concept_name  
   FROM [ClientAlerts.MSC_Concepts] 
  where MSC_CONcept_type in ('MediSpan.Concepts.PackagedDrug') ) a

CROSS JOIN

(select MSC_CONcept_type, concept_id, concept_name , length(concept_name) len
FROM [ClientAlerts.MSC_Concepts] where MSC_CONcept_type in ('MediSpan.Concepts.NamebasedClassification.DrugName') -- and concept_name in ('Pseudoephedrine HCl') ) b

where substr(a.concept_name,1,b.len)+' ' = b.concept_name

Thanks, Savita

Upvotes: 0

Views: 130

Answers (1)

Patrice
Patrice

Reputation: 4692

This has nothing to do with BigQuery itself. When you hardcode, your values are "filtered" way faster, because it doesn't have to check every row, since it looks for the hardcoded value.

If you don't use the hardcoded value, it will look at WAY more rows, compare ALL the rows from your first query with your second. Honestly, if you describe your use case properly here, I don't think of any way to do this faster.

But one question does come to mind. Why do you have a "type". It seems like it should be two different tables instead.

Upvotes: 0

Related Questions