Carson P
Carson P

Reputation: 313

How To Cross Reference Two Big Data Tables With Google BigQuery SQL?

I have two medical data tables that I'm looking to find the hospitals that are common in each table and use two of the fields associated with each hospital in their respective tables. But keep in mind they are in two different tables.

This is what I have so far, but it always comes back with no results.

SELECT
  Spending / Rating AS Ratio,
  Provider_ID_Info AS ID
FROM (
  SELECT
    Period,
    Provider_ID_Spend,
    Avg_Spending_Per_Episode_Hospital as Spending,
    Provider_ID_Info,
    Hospital_overall_rating as Rating
  FROM
    [OmniHealth.HospitalSpending],
    [OmniHealth.HospitalGeneralInfo]
  WHERE
    Provider_ID_Info = Provider_ID_Spend  
  GROUP BY
    Rating,
    Spending,
    Provider_ID_Spend,
    Avg_Spending_Per_Episode_Hospital,
    Provider_ID_Info,
    Hospital_overall_rating,
    Period 
    )

  GROUP BY
    Ratio,
    ID,
    Spending,
    Rating

  ORDER BY
    Ratio

Upvotes: 0

Views: 300

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173200

The problem with your query was in fact that in BigQuery legacy SQL - comma is used as a UNION ALL - not as JOIN!!!

So, try below for BigQuery Legacy SQL (as it is a sql dialect of query in your question )

#legacySQL
SELECT
  Spending / Rating AS Ratio,
  Provider_ID_Info AS ID
FROM (
  SELECT
    Period,
    Provider_ID_Spend,
    Avg_Spending_Per_Episode_Hospital AS Spending,
    Provider_ID_Info,
    Hospital_overall_rating AS Rating
  FROM
    [OmniHealth.HospitalSpending] s
  JOIN
    [OmniHealth.HospitalGeneralInfo] i
  ON
    i.Provider_ID_Info = s.Provider_ID_Spend  
  GROUP BY
    Rating,
    Spending,
    Provider_ID_Spend,
    Provider_ID_Info,
    Period 
)
GROUP BY
  Ratio,
  ID,
  Spending,
  Rating
ORDER BY
  Ratio   

It is recommended to work with BigQuery Standard SQL

#standardSQL
SELECT
  Spending / Rating AS Ratio,
  Provider_ID_Info AS ID
FROM (
  SELECT
    Period,
    Provider_ID_Spend,
    Avg_Spending_Per_Episode_Hospital AS Spending,
    Provider_ID_Info,
    Hospital_overall_rating AS Rating
  FROM
    `OmniHealth.HospitalSpending` s
  JOIN
    `OmniHealth.HospitalGeneralInfo` i
  ON
    i.Provider_ID_Info = s.Provider_ID_Spend  
  GROUP BY
    Rating,
    Spending,
    Provider_ID_Spend,
    Provider_ID_Info,
    Period 
)
GROUP BY
  Ratio,
  ID,
  Spending,
  Rating
ORDER BY
  Ratio    

Please note: you might be having issues with ambiguous fields if you have any as result of JOIN - so you will need to use respective aliases - s or i

Update:
also removed extra columns in GROUP BY

Yet another update for run with real data

Ratio   ID
   0.0  10019
   9.0  10019
  39.5  10019
  86.0  10019
 236.5  10019
 458.5  10019
 485.0  10019
 531.0  10019
1259.0  10019
1772.0  10019
8834.0  10019

It is totally another story if this makes sense or not - but it is your query/logic - so up to you

Upvotes: 2

Related Questions