Reputation: 1
My use case is First i need to do batch ingestion so that a datascource is created for that batch ingestion. Next for that same datasource i need to append data using streaming ingestion (which is real time). How to do this in Apache Druid.
I have tried batch ingestion and streaming separately.
Upvotes: 0
Views: 149
Reputation: 1
We can do Streaming and Batch ingestion both on a single datasource. firstly, do batch ingestion. Next, do streaming ingestion by giving batch ingestion datasource name and vice versa. This senario will overcome the usecase.
Upvotes: -1
Reputation: 196
This is very common, and usually has something like Apache Airflow controlling batch ingestion while the supervisor handles consumption from Apache Kafka, Azure Event Hub, Amazon Kinesis, etc.
Both batch and streaming ingestion into Apache Druid allow you to specify a destination table.
Streaming ingestion is supervised, meaning that it runs continuously until you stop it. You can have only one supervisor per table.
Batch ingestion is asynchronous, and can be run at any time against a table.
As you build out an ingestion specification for streaming ingestion in the Druid console, notice that it builds a JSON document. It contains the all-important table name in the datasource
element.
Note that, in the current version of Druid, locking is (essentially) on time intervals. Therefore, so long as your batch ingestion and streaming ingestion do not cross time periods, you will avoid this, and can run both batch and streaming ingestion at the same time on the same table.
See also streaming documentation on the official site and the associated tutorial.
Upvotes: 0