Understanding a few of the features of BigQuery BigLake

Question

From the following intro to BQ BigLake, it gives the following picture:

What exactly do the following three things mean in practice:

Fast scans
Predicate pushdown
Row filter evaluation

I believe predicate pushdown is the ability to evaluate the WHERE expression directly on the data source, so for example if I was running the following query on a MySQL table:

SELECT * FROM table WHERE country="US"

Predicate pushdown would be taking the country='US' and evaluating that at the native source, whereas if we didn't have the predicate pushdown (in this made up example), it would involve running SELECT * FROM table -- copying all the data over to the processing server -- and then evaluating the predicate there. But if this is correct, how would, for example, predicate pushdown work on a csv file?

Also, what would row filter evaluation be? It seems like predicate pushdown and row filter evaluation would be the same thing, but maybe one of these means something more along the lines of "column filter" -- i.e., only bringing over data from the columns that are needed.

And finally, what are 'fast scans'? I suppose this just means parallelized reading of multiple files, or a file that can be read in parallel (such as a json-newline file). Is the actual reading of the file any different than if I were, for example, to copy over a file with a cli tool such as $ s3 copy myfile . ?

Understanding a few of the features of BigQuery BigLake

Answers (1)

Related Questions