Reputation: 11
I am currently working with DBs in Snowflake, and I have a performance problem.
In my query, I need to retrieve 100 rows of data from a MAIN_TABLE which contains foreign keys to other tables. MAIN_TABLE size is between 50K-300K rows.
This MAIN_TABLE contains ids (foreign keys) of type NUMBER(38,0), and the description of those fields/columns is retrieved from other tables with a straightforward INNER JOIN.
The secondary tables (table_one, table_two...) are very small, all of them have less than 20 rows, except one, that has around 1500 rows (still very small)
The problem is:
Running the query, as shown below, takes an average of 1.7s to complete. If I delete all the DESC fields (retrieved using INNER JOINs), and all the INNER JOINs, the query takes an average of 300ms to complete.
The current solution my peers implemented:
They have cached the information of all the secondary tables in our app (made in Java, but that is irrelevant). For about 80% of the cases (see query below), that we do not need to filter by DESCription, this works, but on the other 20% of cases we still have to perform the INNER JOINs.
Downsizes of the solution:
Is not a bad solution, but it has two problems:
One possibility that comes to mind is caching those tables in SF, but I haven't found a straightforward way to do it yet. Maybe there is a way to optimize the query, but I do not yet understand how Snowflake works internally, as far as I know it does not uses Indexes, at least not in the same way other platforms do.
So, is there a way to optimize the query in SF for 100% of the cases?
SELECT
main_table.ONE_ID,
main_table.TWO_ID,
main_table.THREE_ID,
main_table.FOUR_ID,
main_table.FIVE_ID,
main_table.SIX_ID,
main_table.SEVEN_ID,
field_one.ONE_DESC,
field_two.TWO_DESC,
field_three.THREE_DESC,
field_four.FOUR_DESC,
field_five.FIVE_DESC,
field_six.SIX_DESC,
field_seven.SEVEN_DESC
FROM
SOME_DATABASE.MAIN_TABLE AS main_table
INNER JOIN SOME_DATABASE.TABLE_ONE AS table_one ON main_table.field_one_id = table_one .ONE_ID
INNER JOIN SOME_DATABASE.TABLE_TWO AS table_two ON main_table.field_two_id = table_two .TWO_ID
INNER JOIN SOME_DATABASE.TABLE_THREE AS table_tree ON main_table.field_tree_id = table_tree .THREE_ID
INNER JOIN SOME_DATABASE.TABLE_FOUR AS table_four ON main_table.field_four_id = table_four .FOUR_ID
INNER JOIN SOME_DATABASE.TABLE_FIVE AS table_five ON main_table.field_five_id = table_five .FIVE_ID
INNER JOIN SOME_DATABASE.TABLE_SIX AS table_six ON main_table.field_six_id = table_six .SIX_ID
INNER JOIN SOME_DATABASE.TABLE_SEVEN AS table_seven ON main_table.field_seven_id = table_seven .SEVEN_ID
WHERE
main_table.ONE_ID IN (25, 26)
AND main_table.TWO_ID IN (10, 12)
AND main_table.THREE_ID IN (1, 2, 3)
AND main_table.FOUR_ID IN (2, 3)
AND main_table.FIVE_ID IN (3)
AND main_table.SEVEN_ID IN (1)
-- The following WHERE clauses are present in about 20% of the queries
AND table_one.ONE_DESC,
AND table_two.TWO_DESC,
AND table_three.THREE_DESC,
ORDER BY
main_table.ONE_ID,
main_table.TWO_ID,
main_table.THREE_ID
LIMIT
100 OFFSET 0
Small update:
I have been trying using WITH clauses, both wrapping the ids and the description, but there does not seem to be an improvement
Upvotes: 1
Views: 5373
Reputation: 25903
My first though is, snowflake is not designed to ask tiny queries like this fast, it made to answer big queries. Thus to me 1.xs is fine response time.
But given you would like it to be faster..
I would first check to if the slow query's compile time is the same as the fast. Because if your tables are fragmented more meta data will need to be read.
Next I would look at profile of the execution, to see where the time is spent. In the past we have found things like
SELECT columnA, columnB FROM table where ID == 1
UNION ALL
SELECT columnA, columnB FROM table where ID == 2
to be faster than
SELECT columnA, columnB FROM table where ID in (1,2)
oh huge SQL statements we have found Gordon's point of explicitly putting WHERE clauses at the time of the SELECT from a table helps where the optimizer does get confused.
But another aspect might just be the extra data takes longer to transmit to you client (if you are using client time as the judge) or if you looking at the performance in the WebUI, the more data you read on the longer it takes to run the query. So even if you build a new table
CREATE TABLE testo AS
SELECT
main_table.ONE_ID,
main_table.TWO_ID,
main_table.THREE_ID,
main_table.FOUR_ID,
main_table.FIVE_ID,
main_table.SIX_ID,
main_table.SEVEN_ID,
table_one.ONE_DESC,
table_two.TWO_DESC,
table_three.THREE_DESC,
table_four.FOUR_DESC,
table_five.FIVE_DESC,
table_six.SIX_DESC,
table_seven.SEVEN_DESC
FROM
SOME_DATABASE.MAIN_TABLE AS main_table
INNER JOIN SOME_DATABASE.TABLE_ONE AS table_one
ON main_table.field_one_id = table_one.ONE_ID
INNER JOIN SOME_DATABASE.TABLE_TWO AS table_two
ON main_table.field_two_id = table_two.TWO_ID
INNER JOIN SOME_DATABASE.TABLE_THREE AS table_tree
ON main_table.field_tree_id = table_tree.THREE_ID
INNER JOIN SOME_DATABASE.TABLE_FOUR AS table_four
ON main_table.field_four_id = table_four.FOUR_ID
INNER JOIN SOME_DATABASE.TABLE_FIVE AS table_five
ON main_table.field_five_id = table_five.FIVE_ID
INNER JOIN SOME_DATABASE.TABLE_SIX AS table_six
ON main_table.field_six_id = table_six.SIX_ID
INNER JOIN SOME_DATABASE.TABLE_SEVEN AS table_seven
ON main_table.field_seven_id = table_seven.SEVEN_ID
and then do
ONE_ID,
TWO_ID,
THREE_ID,
FOUR_ID,
FIVE_ID,
SIX_ID,
SEVEN_ID,
ONE_DESC,
TWO_DESC,
THREE_DESC,
FOUR_DESC,
FIVE_DESC,
SIX_DESC,
SEVEN_DESC
FROM testo
WHERE
ONE_ID IN (25, 26)
AND TWO_ID IN (10, 12)
AND THREE_ID IN (1, 2, 3)
AND FOUR_ID IN (2, 3)
AND FIVE_ID IN (3)
AND SEVEN_ID IN (1)
will take longer than:
ONE_ID,
TWO_ID,
THREE_ID,
FOUR_ID,
FIVE_ID,
SIX_ID,
SEVEN_ID
FROM testo
WHERE
ONE_ID IN (25, 26)
AND TWO_ID IN (10, 12)
AND THREE_ID IN (1, 2, 3)
AND FOUR_ID IN (2, 3)
AND FIVE_ID IN (3)
AND SEVEN_ID IN (1)
and finally, I know it's example code, but there is little point aliasing a table from main_table
to main_table
as that was the alias/name already in place.
Upvotes: 1
Reputation: 980
Have you by chance tried to redesign your dictionary tables? As in you have a:
and you can try make it into
this time instead of 5 small tables you'll have 1 slightly bigger and a bit more complex join conditions because you'd have to add dictionary ID to each of them, but still all of them are likely going to be retrievable in a single block of data instead of 5 blocks
Upvotes: 0
Reputation: 1269443
I'm just curious if it would make a difference if you phrased the query with the filtering on the main table in subqueries:
FROM (SELECT *
FROM main_table
WHERE main_table.ONE_ID IN (25, 26) AND
main_table.TWO_ID IN (10, 12) AND
main_table.THREE_ID IN (1, 2, 3) AND
main_table.FOUR_ID IN (2, 3) AND
main_table.FIVE_ID IN (3) AND
main_table.SEVEN_ID IN (1)
) main_table JOIN
. . .
One of the downsides to tremendous scalability is that the methods used can add overhead to smaller queries. That is, if you multiplied your table sizes by 1000, the query might still only take a few seconds -- but that doesn't help you eke out faster times on the smaller data.
Upvotes: 1