eaolson
eaolson

Reputation: 15094

Get BLOB out of Avro FlowFile

I've retrieving some binary files (in this case, some PDFs) from a database using ExecuteSQL, which returns the result in an Avro FlowFile. I can't figure out how to get the binary result out of the Avro records.

I've tried using ConvertAvroToJSON, which gives me an object like:

{"MYBLOB": {"bytes": "%PDF-1.4\n [...] " }}

However, using EvaluateJSONPath and grabbing $.MYBLOB.bytes, causes corruption because the binary bytes get converted to UTF8.

None of the record writer options with ConvertRecord seem appropriate for binary data.

The best solution I can think of is to base64 encode the binary before it leaves the database, then I'm dealing with only character data and can decode it in NiFi. But that's extra steps and I'd prefer not to do that.

Upvotes: 0

Views: 408

Answers (1)

mattyb
mattyb

Reputation: 12083

You may need a scripted solution in this case (as a workaround), to get the field and decode it using your own encoding. In any case please feel free to file a Jira case, ConvertAvroToJSON is deprecated but we should support Character Sets for the JsonRecordSetWriter in ExecuteSQLRecord/ConvertRecord (if that also doesn't work for you).

Upvotes: 0

Related Questions