adapt-dev
adapt-dev

Reputation: 1768

How to create a Dockerfile for cassandra (or any database) that includes a schema?

I would like to create a dockerfile that builds a Cassandra image with a keyspace and schema already there when the image starts.

In general, how do you create a Dockerfile that will build an image that includes some step(s) that can't really be done until the container is running, at least the first time?

Right now, I have two steps: build the cassandra image from an existing cassandra Dockerfile that maps a volume with the CQL schema files into a temporary directory, and then run docker exec with cqlsh to import the schema after the image has been started as a container.

But that doesn't create an image with the schema - just a container. That container could be saved as an image, but that's cumbersome.

    docker run --name $CASSANDRA_NAME -d \
        -h $CASSANDRA_NAME \
        -v $CASSANDRA_DATA_DIR:/data \
        -v $CASSANDRA_DIR/target:/tmp/schema \
        tobert/cassandra:2.1.7

then

docker exec $CASSANDRA_NAME cqlsh  -f /tmp/schema/create_keyspace.cql
docker exec $CASSANDRA_NAME cqlsh  -f /tmp/schema/schema01.cql
# etc

This works, but it makes it impossible to use with tools like Docker compose since linked containers/services will start up too and expect the schema to be in place.

I saw one attempt where the cassandra process as attempted to be started in the background in the Dockerfile during build, then cqlsh run, but I don't think that worked too well.

Upvotes: 8

Views: 9295

Answers (3)

Rogelio Triviño
Rogelio Triviño

Reputation: 6559

Another approach used by our team is create schema on server init. Our java code test if exist the SCHEMA, if not (new environment, new deployment) create it.

Same for every new TABLE, automatic CREATE TABLE creates required new tables for new data entities when they run in any new cluster (other developer local, preproduction, production).

All this code is isolated inside our DataDriver classes for portability, in case we change Cassandra for another DB in some client or project.

This prevent a lot of hassle both for admins and for developers. This approach is even valid for initial data loading, we use on tests.

Upvotes: 0

suraj1287
suraj1287

Reputation: 129

Make a docker file Dockerfile_CAS:


FROM cassandra:latest

COPY ddl.cql docker-entrypoint-initdb.d/

COPY docker-entrypoint.sh /docker-entrypoint.sh

RUN ls -la *.sh; chmod +x *.sh; ls -la *.sh

ENTRYPOINT ["/docker-entrypoint.sh"]

CMD ["cassandra", "-f"]


edit docker-entrypoint.sh, add

for f in docker-entrypoint-initdb.d/*; do case "$f" in *.sh) echo "$0: running $f"; . "$f" ;; *.cql) echo "$0: running $f" && until cqlsh -f "$f"; do >&2 echo "Cassandra is unavailable - sleeping"; sleep 2; done & ;; *) echo "$0: ignoring $f" ;; esac echo done

above exec "$@"


docker build -t suraj1287/cassandra -f Dockerfile_CAS .

and rebuild the image...

Upvotes: 1

doanduyhai
doanduyhai

Reputation: 8812

Ok I had this issue and someone advised me some strategy to deal with:

  1. Start from an existing Cassandra Dockerfile, the official one for example
  2. Remove the ENTRYPOINT stuff
  3. Copy the schema (.cql) file and data (.csv) into the image and put it somewhere, /opt/data for example
  4. create a shell script that will be used as the last command to start Cassandra

    a. start cassandra with $CASSANDRA_HOME/bin/cassandra

    b. IF there is a $CASSANDRA_HOME/data/data/your_keyspace-xxxx folder and it's not empty, do nothing more

    c. Else

    1. sleep some time to allow the server to listen on port 9042
    2. when port 9042 is listening, execute the .cql script to load csv files
    

I found this procedure rather cumbersome but there seems to be no other way around. For Cassandra hands-on lab, I found it easier to create a VM image using Vagrant and Ansible.

Upvotes: 6

Related Questions