Reputation: 349
I created a Parquet file with custom metadata at file level:
Now I'm trying to read that metadata from the Parquet file in (Azure) Databricks. But when I run the following code I don't get any metadata which is present there.
storageaccount = 'zzzzzz'
containername = 'yyyyy'
access_key = 'xxxx'
spark.conf.set(f'fs.azure.account.key.{storageaccount}.blob.core.windows.net', access_key)
path = f"wasbs://{containername}@{storageaccount}.blob.core.windows.net/generated_example_10m.parquet"
data = spark.read.format('parquet').load(path)
print(data.printSchema())
Upvotes: 3
Views: 1548
Reputation: 2729
I try to reproduce same thing in my environment. I got this output.
Please follow below code and Use select("*", "_metadata")
path = "wasbs://<container>@<storage_account_name>.blob.core.windows.net/<file_path>.parquet"
data = spark.read.format('parquet').load(path).select("*", "_metadata")
display(data)
or
Mention your schema and load path with .select("*", "_metadata")
df = spark.read \
.format("parquet") \
.schema(schema) \
.load(path) \
.select("*", "_metadata")
display(df)
Upvotes: 1