Reputation: 329
I have the following code:
SELECT ta.application as koekkoek, ta.ipc, ipc_count/ipc_tot as ipc_share, t3.sfields FROM (
select t1.appln_id as application, t1.ipc_subclass_symbol as ipc, count(t2.appln_id) as ipc_count, sum(ipc_count) over (PARTITION BY application) as ipc_tot
FROM temp.tls209_small t1
CROSS JOIN
(SELECT appln_id, FROM temp.tls209_small group by appln_id ) t2
where t1.appln_id = t2.appln_id
GROUP BY application, ipc
) as ta
CROSS JOIN thesis.ifris_ipc_concordance t3
WHERE ta.ipc LIKE t3.ipc+'%'
AND ta.ipc NOT LIKE t3.not_ipc+'%'
AND t3.not_appln_id NOT IN
(SELECT ipc_subclass_symbol from temp.tls209_small t5 where t5.appln_id = ta.application)
Giving the folllowing error:
Field 'ta.application' not found.
I have tried numerous notations for the field, but BigQuery doesn't seem to recognize any reference to other tables in the subquery.
The purpose of the code is as to assign new technology classifications to records based on a concordance table:
I have got two tables:
One large table with application id's, classifications and some other stuff tls209_small
:
And a concordance table with some exception rules ifris_ipc_concordance
:
In the end I need to assign the sfields
label for each row in tls209
(300 million rows). The rules are that ipc_class_symbol+'%'
from the first table should be like ipc
in the second table, but not like not_ipc
.
In addition, the not_appln_id
value, if present, should not be associated with the same appln_id in the first table.
So a small example, say this is the input of the query:
appln_id | ipc_class_symbol
1 | A1
1 | A2
1 | A3
1 | C3
sfields | ipc | not_ipc | not_appln_id
X | A | A2 | null
Y | A | null | A3
appln_id 1 should get two times sfields X because ipc=A, not_ipc matches A1 and A3. Y should not be assigned at all as A3 occurs in appln_id 1.
In the results, I also need the share of the ipc_class_symbol
for a single application (1 for 328100001, 0.5 for 32100009 etc.)
Without the last condition (AND t3.not_appln_id NOT IN (SELECT ipc_subclass_symbol from temp.tls209_small t5 where t5.appln_id = ta.application)
) the query works fine:
Any suggestions on how to get the subquery to recognize the application id (ta.application), or other ways to introduce the last condition to the query?
I realize my explanation of the problem may not be very straightforward, so if anything is not clear please indicate so, I'll try to clarify the issues.
Upvotes: 1
Views: 1290
Reputation: 329
A working solution for the problem was achieved by first generating a table my matching only the ipc_class_symbol
from the first table, to the ipc
column of the second, but also including the not_ipc
, and not_appln_id
columns from the second. In addition, a list of all ipc class labels assigned to each appln_id was added using the GROUP_CONCAT
method.
Finally, with help from Pentium10, the resulting table was filtered based on the exeption rules as also discussed in this question.
In the final query, the GROUP BY and JOIN arguments needed EACH modifiers to allow the large tables to be processed:
SELECT application as appln_id, ipc as ipc_class, ipc_share, sfields as ifris_class FROM (
SELECT * FROM (
SELECT ta.application as application, ta.ipc as ipc, ipc_count/ipc_tot as ipc_share, t3.sfields as sfields, t3.ipc as yes_ipc, t3.not_ipc as not_ipc, t3.not_appln_id as exclude, t4.classes as other_classes FROM (
SELECT t1.appln_id as application, t1.ipc_class_symbol as ipc, count(t2.appln_id) as ipc_count, sum(ipc_count) over (PARTITION BY application) as ipc_tot
FROM thesis.tls209_appln_ipc t1
FULL OUTER JOIN EACH
(SELECT appln_id, FROM thesis.tls209_appln_ipc GROUP EACH BY appln_id ) t2
ON t1.appln_id = t2.appln_id
GROUP EACH BY application, ipc
) AS ta
LEFT JOIN EACH (
SELECT appln_id, GROUP_CONCAT(ipc_class_symbol) as classes FROM [thesis.tls209_appln_ipc]
GROUP EACH BY appln_id) t4
ON ta.application = t4.appln_id
CROSS JOIN thesis.ifris_ipc_concordance t3
WHERE ta.ipc CONTAINS t3.ipc
) as tx
WHERE (not ipc contains not_ipc or not_ipc is null)
AND (not other_classes contains exclude or exclude is null or other_classes is null)
)
Upvotes: 1
Reputation: 207828
The query you're performing is doing an anti-join. You can re-write this as an explicit join, but it is a little verbose:
SELECT *
FROM [x.z] as z
LEFT OUTER JOIN EACH [x.y] as y ON y.appln_id = z.application
WHERE y.not_appln_id is NULL
Upvotes: 1