dahuin
dahuin

Reputation: 87

How does schema inference work in spark.read.parquet?

I'm trying to read a parquet file on spark and I have a question.

How is the type inferred when loading a parquet file with spark.read.parquet?

Is there a dictionary for mapping like 1? Or is it inferred from the actual stored values like 2?

Upvotes: 0

Views: 1258

Answers (1)

Spark uses the parquet schema to parse it to an internal representation (i.e, StructType), it is a bit hard to find this information on spark docs. I went through the code to find the mapping you are looking for here:

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala#L197-L281

Upvotes: 1

Related Questions