user7097216
user7097216

Reputation:

How to read parquet file from s3 bucket in nifi?

I am trying to read parquet file from s3 bucket in nifi. to read the file I have used processor listS3 and fetchS3Object and then ExtractAttribute processor. till there it looked fine.

the files are in parquet.gz file and by no mean i was able to generate the flowfile from them, My final purpose is to load the file in noSql(SnowFlake).

FetchParquet works with HDFS which we are not used.

My next option is to use executeScript processor (with python) to read these parquet file and save them back to text.

Can somebody please suggest any work around.

Upvotes: 0

Views: 2474

Answers (1)

Bryan Bende
Bryan Bende

Reputation: 18630

It depends what you need to do with the Parquet files.

For example, if you wanted to get them to your local disk, then ListS3 -> FetchS3Object -> PutFile would work fine. This is because this scenario is just moving around bytes and doesn't really matter whether it is Parquet or not.

If you need to actually interpret the Parquet data in some way, which it sounds like you do for getting it into a database, then you need to use FetchParquet and convert from Parquet to some other format like Avro, Json, or Csv, and then send that to one of the database processors.

You can use Fetch/Put Parquet processors, or any other HDFS processors, with s3 by configuring a core-site.xml with an s3 filesystem.

http://apache-nifi-users-list.2361937.n4.nabble.com/PutParquet-with-S3-td3632.html

Upvotes: 1

Related Questions