wyn
wyn

Reputation: 141

Trouble querying tables in multiple datasets in bigquery

I am trying to query two bigquery tables from two different datasets to get 2 separate columns. I have both tried union and joins but they are not giving me want I need. Below is the query I tried

with abagrowth as (
SELECT
  session abas,
  term abat,
  COUNT(distinct studentid) AS acount,
  ROUND(100 * (COUNT(distinct studentid) - LAG(COUNT(distinct studentid), 1) OVER (ORDER BY session)) / LAG(COUNT(distinct studentid), 1) OVER (ORDER BY session),0) || '%' AS agrowth
FROM
  aba.abaresult
GROUP BY
  1,
  2
ORDER BY
  1,
  2),

bidagrowth as (
SELECT
  session bidas,
  term bidat,
  COUNT(distinct studentid) AS bcount,
  ROUND(100 * (COUNT(distinct studentid) - LAG(COUNT(distinct studentid), 1) OVER (ORDER BY session)) / LAG(COUNT(distinct studentid), 1) OVER (ORDER BY session),0) || '%' AS bgrowth
FROM
  bida.bidaresult
GROUP BY
  1,
  2
ORDER BY
  1,
  2)

select abas, agrowth from abagrowth
union all
select bidas, bgrowth from bidagrowth

The dataset is similar to this

name  subject  session      totalscore
-------------------------------------------
jack  maths    2013/2014         70
jane  maths    2013/2014         65
jill  maths    2013/2014         80
jack  maths    2014/2015         72
jack  eng      2014/2015         87
jane  science  2014/2015         67
jill  maths    2014/2015         70
jerry eng      2014/2015         70
jaasp science  2014/2015         85

The table I am trying to get is meant to be in this format or something similar

session    agrowth  bgrowth
2013/2014   null     null
2014/2015   10%       11%
2015/2016   5%        2%

The figures above are assumed for example sake.

Questions

  1. Is this possible with bigquery?

  2. If yes, How can one achieve this?

Thanks

Upvotes: 0

Views: 138

Answers (1)

Tlaquetzal
Tlaquetzal

Reputation: 2850

Regarding the dataset. Yes, you can query two datasets. Check out this answer. Basically, you just need to indicate the project (optional), dataset and table you're using.

For the data you want to get. You can achieve it using a JOIN rather than UNION. JOINING the tables by session will allow you to have one row per session. Then you can choose what columns to include in your SELECT.

WITH abagrowth AS (
SELECT
  session,
  term abat,
  COUNT(distinct studentid) AS acount,
  ROUND(100 * (COUNT(distinct studentid) - LAG(COUNT(distinct studentid), 1) OVER (ORDER BY session)) / LAG(COUNT(distinct studentid), 1) OVER (ORDER BY session),0) || '%' AS agrowth
FROM
  aba.abaresult
GROUP BY
  1,
  2
ORDER BY
  1,
  2),

bidagrowth AS (
SELECT
  session,
  term bidat,
  COUNT(distinct studentid) AS bcount,
  ROUND(100 * (COUNT(distinct studentid) - LAG(COUNT(distinct studentid), 1) OVER (ORDER BY session)) / LAG(COUNT(distinct studentid), 1) OVER (ORDER BY session),0) || '%' AS bgrowth
FROM
  bida.bidaresult
GROUP BY
  1,
  2
ORDER BY
  1,
  2)

SELECT aba.session, aba.agrowth, bida.bgrowth
   FROM abagrowth aba
   JOIN bidagrowth bida
        ON aba.session = bida.session

UNION will stack the results from the two queries.

Upvotes: 1

Related Questions