Edzio Edziowski
Edzio Edziowski

Reputation: 21

Databricks Statistics on delta table

Do You think statistics on column generation have sense in delta lake ? Does it optimize joins & aggregations or maybe there counts only statistics inside _delta_log ?

enter image description here

tryied to see if statistics have influence on performance searching alot of internet resources without answer

Upvotes: 0

Views: 3096

Answers (1)

Powers
Powers

Reputation: 19328

Delta Lake metadata statistics can be used to optimize different types of queries. Here are examples of queries that can run faster by leveraging the table metadata:

  • select count(*) from the_table
  • select * from the_table limit 1

See this blog post for more information.

Column level statistics can optimize any operation whenever entire files can get skipped. Reading less data always makes queries run faster.

Upvotes: 0

Related Questions