Reputation: 163
I'm very new to Druid and want to know how we can ingest Parquet files on S3 into Druid? We get data in CSV format and we standardise it to Parquet format in the Data Lake. This then needs to be loaded into Druid.
Upvotes: 0
Views: 1013
Reputation: 163
Instead of trying to ingest parquet files from S3, I streamed data to a Kinesis topic and used that as a source for Druid.
Upvotes: 1
Reputation: 2276
You have to add druid-parquet-extensions
in the druid.extensions.loadList
in the common.runtime.properties file.
After that you can restart the Druid server.
However, only ingesting a parquet file from local source is documented. I couldn't verify loading from S3 as my files were encrypted.
Try adding the above extension and then read from S3 just like you'd ingest a regular file from S3.
Upvotes: 0