Bigquery - reduce query execution time

Question

I have bigquery table with the following structure:

source | feature |  value |
---------------------------
s1     |    f1   |  HIGH  |
---------------------------
s2     |    f1   |  LOW   |
---------------------------
s2     |    f2   | MEDIUM |
.
.
.
---------------------------
s7000 |   f1000 |  HIGH   |

I would like to count how much sources appears for all permutations of features and values. Example result set:

feature  |  feature  | value  | value  |  count
-----------------------------------------------
    f1   |    f2     |  HIGH  | HIGH   |   7
-----------------------------------------------
    f1   |    f2     |  LOW   | HIGH   |  211
.
.
.
-----------------------------------------------
  f7000  |  f1000    |  HIGH  |MEDIUM |  123

This is my sql query:

select     m1.feature as f1, m2.feature as f2, m1.value as v1, m2.value as v2, count(*)
from       matrix  m1 inner join matrix m2 on m1.source=m2.source
group by   m1.feature, m2.feature, m1.value, m2.value

The problem is very poor performance of bigquery in this case. Matrix table is not so big, about 7,000,000 records. I stopped the query execution after 7 minutes...

Could you help me to improve the performance of my query?

Bigquery - reduce query execution time

Answers (1)

Related Questions