Maarab
Maarab

Reputation: 163

How to ingest Parquet files residing on AWS S3 into Druid

I'm very new to Druid and want to know how we can ingest Parquet files on S3 into Druid? We get data in CSV format and we standardise it to Parquet format in the Data Lake. This then needs to be loaded into Druid.

Upvotes: 0

Views: 1013

Answers (2)

Maarab
Maarab

Reputation: 163

Instead of trying to ingest parquet files from S3, I streamed data to a Kinesis topic and used that as a source for Druid.

Upvotes: 1

Shailesh
Shailesh

Reputation: 2276

You have to add druid-parquet-extensions in the druid.extensions.loadList in the common.runtime.properties file.

After that you can restart the Druid server.

However, only ingesting a parquet file from local source is documented. I couldn't verify loading from S3 as my files were encrypted.

Try adding the above extension and then read from S3 just like you'd ingest a regular file from S3.

Upvotes: 0

Related Questions