Korenaga
Korenaga

Reputation: 349

Databricks/Spark read custom metadata from Parquet file

I created a Parquet file with custom metadata at file level: data = spark.read.option("mergeSchema", "true").parquet(path)

Now I'm trying to read that metadata from the Parquet file in (Azure) Databricks. But when I run the following code I don't get any metadata which is present there.

storageaccount = 'zzzzzz'
containername = 'yyyyy'
access_key = 'xxxx'
spark.conf.set(f'fs.azure.account.key.{storageaccount}.blob.core.windows.net', access_key)

path = f"wasbs://{containername}@{storageaccount}.blob.core.windows.net/generated_example_10m.parquet"
data = spark.read.format('parquet').load(path)
print(data.printSchema())

Upvotes: 3

Views: 1548

Answers (1)

Vamsi Bitra
Vamsi Bitra

Reputation: 2729

I try to reproduce same thing in my environment. I got this output.

Please follow below code and Use select("*", "_metadata")

path = "wasbs://<container>@<storage_account_name>.blob.core.windows.net/<file_path>.parquet"
data = spark.read.format('parquet').load(path).select("*", "_metadata")
display(data)

or

Mention your schema and load path with .select("*", "_metadata")

df = spark.read \
  .format("parquet") \
  .schema(schema) \
  .load(path) \
  .select("*", "_metadata")

display(df)

enter image description here

Upvotes: 1

Related Questions