peterko
peterko

Reputation: 533

Performance selecting from an External Rowset

I test performance for different types of select from external DB source.

  1. EXTERNAL Datasource_Identifier
  2. LOCATION csharp_string_literal
  3. EXECUTE csharp_string_literal

I'm interested in performance, because only 3rd type (EXECUTE) is effective in case of WHERE statement.

Am I doing something wrong, or that's normal that U-SQL first read all rows from external table and then filter it inside ADLA (the same behaviour for LOCATION)?

That's a problem/ineffective in case my table is very large and I need use just part of the table rows.

Can I force U-SQL to filter data before reading from EXTERNAL table or from LOCATION? The problem is I need dynamic WHERE statement based on variable.

Upvotes: 1

Views: 93

Answers (1)

Michael Rys
Michael Rys

Reputation: 6684

First you control the ability to push predicates to your SQL Server engine with the REMOTABLE_TYPES clause on your DATA SOURCE object.

Then the predicate needs to be remotable. If you are doing a predicate with a join with a U-SQL rowset (table), then it may not be easy to remote it efficiently (I am not sure if we map a join into a semijoin yet).

Since you seem to be able to remote the predicate you use in the EXECUTE, I would think that there is a good chance that you could write the queries in the other cases in a way that they can be remoted. But without seeing the queries, it is hard to say for sure.

If you want us to take a look, please contact me by email (usql at microsoft dot com).

Upvotes: 2

Related Questions