Parquet API doesn't have the concept of Keys?

Question

Ok so after getting exceptions about not being able to write keys into a parquet file via spark I looked into the API and found only this.

public class ParquetOutputFormat extends FileOutputFormat {....

(My assumption could be wrong =D, and there might be another API somewhere. )

Ok This makes some warped sense, after all you can project/restrict the data as it is materialising out of the container file. However, just to be on the safe side. A Parquet file does not have the notion of a sequence file's "key" value , right ?

I find this a bit odd, the Hadoop infrastructure builds around the fact that a sequence file may have a key. And I assume this key is used liberally to partition data into blocks for locality (not at the HDFS level ofc) ? Spark has a lot of API calls that work with the code to do reductions and join's etc. Now I have to do extra step to map the keys out from the body of the materialised object. Weird.

So any good reasons why a key is not a first class citizen in the parquet world ?

Parquet API doesn't have the concept of Keys?

Answers (1)

Related Questions

Parquet API doesn&#39;t have the concept of Keys?

Answers (1)

Related Questions

Parquet API doesn't have the concept of Keys?