Shrulik
Shrulik

Reputation: 284

WITH vs Consumer Groups - performance and other considerations

I want to know what are the performance implications of a WITH ? Should I prefer to depend as much as possible on a WITH clause ?

For example, if I have queries that look like that :

Select * from Input1 i where i.type = 'something'

Select * from Input1 i where i.type = 'something-else'

Select * from Input1 i where i.type = 'something-else' and i.cost > 500 

Select * from Input1 i where i.size < 10

a. Should I have a WITH on everything from Input1 just to limit the readers ? And do the other filtering as a second step. I'm guessing this would hurt performance.

b. Shouldn't I just create a consumer group per query ? Why not ?

c. A bit different, is there a relation between the performance of different outputs ? Does it matter if I have several outputs, where part of them are high throughput CosDb collections, and part are a Table storage ? Though the table storage is much better partitioned. Would it be better to separate the two to different (input, consumer group ) pairs, or even completely different ASA jobs?

Upvotes: 0

Views: 59

Answers (1)

Jean-S&#233;bastien
Jean-S&#233;bastien

Reputation: 737

TL;DR: for most typical workloads, you should be fine with a WITH statement to reduce the number of receivers. If you want to fine tune performances, you may create different inputs with different receivers.

Please find the detailed answers below:

a) For the first question, using WITH and then doing the filtering in that step will not impact performance for this query.

b) With the WITH statement, you can reduce the number of receivers needed. However if you need to have a larger number of receivers, you will have to define several inputs, and a different CG for each input. The pros/cons of the 2 approaches will depend of different factors such as the degree of parallelization/partitioning of the query, the volume and distribution of data, etc. You may have to experiment if you have a very high throughput, and use perf numbers and the "job diagram" to guide your decisions. Note that for most current jobs, you should be fine with the WITH statement.

c) If the pipelines are completely independent, you may want to create different jobs to maximize performances. You can use the "job diagram" of your ASA job to visualize your job topology and optimize it.

Let me know if it answers your question.

Thanks,

JS

Upvotes: 1

Related Questions