Reputation: 21
Do You think statistics on column generation have sense in delta lake ? Does it optimize joins & aggregations or maybe there counts only statistics inside _delta_log ?
tryied to see if statistics have influence on performance searching alot of internet resources without answer
Upvotes: 0
Views: 3096
Reputation: 19328
Delta Lake metadata statistics can be used to optimize different types of queries. Here are examples of queries that can run faster by leveraging the table metadata:
select count(*) from the_table
select * from the_table limit 1
See this blog post for more information.
Column level statistics can optimize any operation whenever entire files can get skipped. Reading less data always makes queries run faster.
Upvotes: 0