Persisting data of all kafka topics and injecting into a different instance

Question

I have a single kafka cluster in a docker image.

1)I want to inject data into the kafka topics every time docker image starts up. It is one time initialization process as part of docker startup

2)Initialization data that is used in above step, comes from a predefined scenario. Hence the data would be available in topics (as part of predefined scenario). How can I persist this data to a file and inject into docker startup, for above step 1

I looked up in dockerhub and could not find any relevant images.

dnephin · Accepted Answer

I would suggest trying to make this happen as part of the build step if possible. If you do it as part of the build it will be cached, and you won't have to repeat it every time you start the container.

A pattern I've used with persistent storage (databases) is a build step like this:

Dockerfile:

...
COPY setup.sh  /code/setup.sh
RUN /code/setup.sh
...

setup.sh (this is pseudocode)

./start_kafka.sh &  # start the service in the background
./wait_for_kafka_to_be_available.sh # If the service comes with good init scripts, they might already do this for you
./populate_data.sh   # run a client which puts data into the queue
./stop_kafka.sh   # Do a clean shutdown, a proper init script might provide this as well

Now when the container starts, it should read the persisted data and startup much faster.

If for some reason you can't do that and you need to do it at runtime time, you're probably better off using some init system. You can find an example of this (using s6 for the init system) here https://github.com/dnephin/docker-swarm-slave. It starts two services (dind and swarm-slave) in your case one of the services would run wait_for_kafka_to_be_available.sh and ./populate_data.sh then exit.

Persisting data of all kafka topics and injecting into a different instance

Answers (1)

Related Questions