Reputation: 2967

Is filterPushdown a setting for PySpark?

Came across Hortonwork's blog post advocating for predicate push-down in this post.

I can't find it in Spark 1.4 documentation though (that's the version I'm using). Do I need to worry about setting this to false, or is it already a native setting? If I can change this, how do I do it?

Upvotes: 1

Answers (1)

Assaf Mendelson

Reputation: 12991

Predicate pushdown is part of the catalyst optimization of spark. This happens automatically.

For example, let's say you are creating a dataframe from an SQL server and then you are doing a filter on it. It would probably be better performance if the filtering were to be done in the SQL server instead of in spark (to reduce the amount of traffic on the network). Spark's catalyst engine would recognize that JDBC source supports predicate pushdown and would reorganize your expression to do so.

In the specific example of the article, it only says that the ORC source supports predicate pushdown for specific cases (i.e. when it has builtin indexes).

This is not something you need to worry about in 99.9% of the cases, it will just improve performance behind the scenes.

Upvotes: 1

Is filterPushdown a setting for PySpark?

Answers (1)

Related Questions