Reputation:
I am trying to read parquet file from s3 bucket in nifi.
to read the file I have used processor listS3
and fetchS3Object
and then ExtractAttribute
processor. till there it looked fine.
the files are in parquet.gz
file and by no mean i was able to generate the flowfile
from them, My final purpose is to load the file in noSql(SnowFlake)
.
FetchParquet
works with HDFS
which we are not used.
My next option is to use executeScript
processor (with python
) to read these parquet file and save them back to text.
Can somebody please suggest any work around.
Upvotes: 0
Views: 2474
Reputation: 18630
It depends what you need to do with the Parquet files.
For example, if you wanted to get them to your local disk, then ListS3 -> FetchS3Object -> PutFile would work fine. This is because this scenario is just moving around bytes and doesn't really matter whether it is Parquet or not.
If you need to actually interpret the Parquet data in some way, which it sounds like you do for getting it into a database, then you need to use FetchParquet and convert from Parquet to some other format like Avro, Json, or Csv, and then send that to one of the database processors.
You can use Fetch/Put Parquet processors, or any other HDFS processors, with s3 by configuring a core-site.xml with an s3 filesystem.
http://apache-nifi-users-list.2361937.n4.nabble.com/PutParquet-with-S3-td3632.html
Upvotes: 1