Reputation: 133
I have bigquery table with the following structure:
source | feature | value |
---------------------------
s1 | f1 | HIGH |
---------------------------
s2 | f1 | LOW |
---------------------------
s2 | f2 | MEDIUM |
.
.
.
---------------------------
s7000 | f1000 | HIGH |
I would like to count how much sources appears for all permutations of features and values. Example result set:
feature | feature | value | value | count
-----------------------------------------------
f1 | f2 | HIGH | HIGH | 7
-----------------------------------------------
f1 | f2 | LOW | HIGH | 211
.
.
.
-----------------------------------------------
f7000 | f1000 | HIGH |MEDIUM | 123
This is my sql query:
select m1.feature as f1, m2.feature as f2, m1.value as v1, m2.value as v2, count(*)
from matrix m1 inner join matrix m2 on m1.source=m2.source
group by m1.feature, m2.feature, m1.value, m2.value
The problem is very poor performance of bigquery in this case. Matrix table is not so big, about 7,000,000 records. I stopped the query execution after 7 minutes...
Could you help me to improve the performance of my query?
Upvotes: 0
Views: 319
Reputation: 173190
Few recommendations:
WHERE m1.feature < m2.feature
CREATE TABLE AS
or INSERT INTO
to suppress loading result to UI and rather get it into table that then you can use/query at your will Upvotes: 3