Reputation: 71
I am trying to store my data in nested way in parquet and using map type column to store complex objects as values.
If somebody could let me know whether filter push down works on map type of columns or not.For example below is my sql query -
`select measureMap['CR01'].tenorMap['1M'] from RiskFactor where businessDate='2016-03-14' and bookId='FI-UK'`
measureMap is a map with key as String and value as a custom data type containing 2 attributes - String and another map of String,Double pair.
I want to know whether pushdown will work on map or not i.e if map has 10 key value pairs , Spark will bring whole map's data in memort and create the object model or it will filter out the data depending upon the key at I/O read level.
Also I want ot know is there is any way to specify key in where clause, something like - where measureMap.key = 'CR01' ?
Upvotes: 6
Views: 1512
Reputation: 6994
The short answer is No. Parquet predicate pushdown doesn't work with mapType columns or for the nested parquet structure.
Spark catalyst optimizer only understands the top level column in the parquet data. It uses the column type, column data range, encoding etc to finally generate the whole stage code for the query.
When the data is in a MapType format it is not possible to get this information from the column. You could have hundreds of key-value pair inside a map which is impossible with current spark infrastructure to do a predicate pushdown.
Upvotes: 1