read databricks json with column value is base64 with Autoloader and inferschema

Question

I have JSON files falling in our blob with two fields:

offset (integer)
value (base64)

This value column is JSON with unicode (and that's why it's base64-encoded).

{
  "offset": 1,
  "value": "eyJfaWQiOiAiNjQxY2I3MWQyY...a very long base64-encoded text"
}

Challenge is that this base64-encoded value JSON is very large with 100+ fields so we cannot define the schema. We can only have some schema hints. And Auto Loader seems the best fit.

I tried to use Autoloader with schema hints and other options. It always picks value as a string and unable to parse it back to JSON without providing the schema.

I want databricks to infer the schema.

Infer the schema automatically. the schema is more dynamic. so they can add any field and remove any field at any time. so having fixed schema wont work at all. so we need infer the schema automatically.

But I can see it always infer it as string or the json is escaped as string in the value column makes it impoossible to infer with Autoloader.

read databricks json with column value is base64 with Autoloader and inferschema

Answers (1)

Related Questions