HimalayanNinja
HimalayanNinja

Reputation: 421

In Azure Synapse, is it true that creating a temporary table is discouraged?

I had a good discussion with one of my colleagues and he mentioned creating a temporary table degrades the performance in Azure Synapse because Synapse creates the temporary table first in the master node then distribute them to child node. Is it true? He recommended me to create create permanent table instead of temporary table.

Upvotes: 2

Views: 1955

Answers (1)

GregGalloway
GregGalloway

Reputation: 11625

That’s not correct. Temp tables don’t necessarily funnel through the control node. Let’s say you are selecting from a table distributed on ProductKey and loading it into a #temp table distributed on ProductKey. The data will never leave each compute node since it’s a distribution compatible insert.

On the other hand, if you run a query that uses a ROW_NUMBER function, for example, that would have to be calculated on the control node and then the data would be sent back to the compute nodes to be stored in the distributed temp table. But that only happens in the presence of some types of functions and some types of queries. It is not the norm. If you are worried about a particular query then add the word EXPLAIN to the front of it and paste the explain plan XML into your question so we can help you interpret it.

If you load a #temp table with a SELECT INTO statement you can’t specify the table geometry so it will be a round robin distributed columnstore. Usually this isn’t ideal since it takes extra time and memory to compress a columnstore and because round robin distribution isn’t ideal unless there is no good distribution key. Usually the next query which uses the round robin distributed temp table will just reshuffle it so it’s best to properly hash distribute a temp table initially. To do this do a CTAS statement as described here.

Upvotes: 4

Related Questions