Reputation: 1890
I would like to create keyspaces and column-families at the start of my Cassandra container.
I tried the following in a docker-compose.yml
file:
# shortened for clarity
cassandra:
hostname: my-cassandra
image: my/cassandra:latest
command: "cqlsh -f init-database.cql"
The image my/cassandra:latest
contains init-database.cql
in /
. But this does not seem to work.
Is there a way to make this happen ?
Upvotes: 23
Views: 29069
Reputation: 1
you want to automatically create keyspaces and column families in a Cassandra container using Docker Compose by running a CQL script during startup. However, the issue is that Cassandra might not be ready to accept queries immediately after the container starts, causing the script to fail.
Solution:
To ensure that the CQL script is executed only after Cassandra is fully ready, we can use a combination of healthcheck and a custom entrypoint script. The healthcheck ensures that Cassandra is healthy, and the custom entrypoint script waits until Cassandra is ready (nodetool status reports the node as 'UN'). Once Cassandra is ready, the script executes the CQL file.
version: '3.9'
services:
cassandra:
image: cassandra:${CASSANDRA_VERSION:-5.0}
container_name: cassandra
- "7199:7199" # JMX
- "7000:7000" # cluster communication
- "7001:7001" # cluster communication
- "9042:9042" # native protocol clients
- "9160:9160" # thrift clients
volumes:
- ./cassandra_config/cassandra.yaml:/etc/cassandra/cassandra.yaml
- ./init-database.cql:/docker-entrypoint-initdb.d/init-database.cql
healthcheck:
test: ['CMD-SHELL', "nodetool status | grep '^UN' || exit 1"]
interval: 60s
timeout: 10s
retries: 3
start_period: 30s
networks:
- services
entrypoint: ["/bin/bash", "-c", "until nodetool status | grep '^UN'; do echo 'Waiting for Cassandra...'; sleep 5; done; cqlsh -f /docker-entrypoint-initdb.d/init-database.cql"]
networks:
services:
name: cassandra_network
Upvotes: 0
Reputation: 9881
I solved this problem by patching cassandra's docker-entrypoint.sh
so it will execute sh
and cql
files located in /docker-entrypoint-initdb.d
on startup. This is similar to how MySQL docker containers work.
Basically, I add a small script at the end of the docker-entrypoint.sh
(right before the last line, exec "$@"
), that will run the cql scripts once cassandra is up. A simplified version is:
INIT_DIR=docker-entrypoint-initdb.d
# this whole block will execute in the background
(
cd $INIT_DIR
# wait for cassandra to be ready
while ! cqlsh -e 'describe cluster' > /dev/null 2>&1; do sleep 6; done
echo "$0: Cassandra cluster ready: executing cql scripts found in $INIT_DIR"
# find and execute cql scripts, in name order
for f in $(find . -type f -name "*.cql" -print | sort); do
echo "$0: running $f"
cqlsh -f "$f"
echo "$0: $f executed"
done
) &
This solution works for all cassandra versions (at least until 3.11, as the time of writing).
Hence, you only have to build and use this cassandra image version, and then add proper initializations scripts to the container using docker-compose volumes.
A complete gist with a more robust entrypoint patch (and example) is available here.
Upvotes: 6
Reputation: 1077
I was also searching for the solution to this question, and here is the way how I accomplished it.
Here the second instance of Cassandra has a volume with the schema.cql and runs CQLSH command
My Version with healthcheck so we can get rid of sleep command
version: '2.2'
services:
cassandra:
image: cassandra:3.11.2
container_name: cassandra
ports:
- "9042:9042"
environment:
- "MAX_HEAP_SIZE=256M"
- "HEAP_NEWSIZE=128M"
restart: always
volumes:
- ./out/cassandra_data:/var/lib/cassandra
healthcheck:
test: ["CMD", "cqlsh", "-u cassandra", "-p cassandra" ,"-e describe keyspaces"]
interval: 15s
timeout: 10s
retries: 10
cassandra-load-keyspace:
container_name: cassandra-load-keyspace
image: cassandra:3.11.2
depends_on:
cassandra:
condition: service_healthy
volumes:
- ./src/main/resources/cassandra_schema.cql:/schema.cql
command: /bin/bash -c "echo loading cassandra keyspace && cqlsh cassandra -f /schema.cql"
NetFlix Version using sleep
version: '3.5'
services:
cassandra:
image: cassandra:latest
container_name: cassandra
ports:
- "9042:9042"
environment:
- "MAX_HEAP_SIZE=256M"
- "HEAP_NEWSIZE=128M"
restart: always
volumes:
- ./out/cassandra_data:/var/lib/cassandra
cassandra-load-keyspace:
container_name: cassandra-load-keyspace
image: cassandra:latest
depends_on:
- cassandra
volumes:
- ./src/main/resources/cassandra_schema.cql:/schema.cql
command: /bin/bash -c "sleep 60 && echo loading cassandra keyspace && cqlsh cassandra -f /schema.cql"
P.S I found this way at one of the Netflix Repos
Upvotes: 30
Reputation: 1385
We recently tried to solve a similar problem in KillrVideo, a reference application for Cassandra. We are using Docker Compose to spin up the environment needed by the application which includes a DataStax Enterprise (i.e. Cassandra) node. We wanted that node to do some bootstrapping the first time it was started to install the CQL schema (using cqlsh
to run the statements in a .cql
file just like you're trying to do). Basically the approach we took was to write a shell script for our Docker entrypoint that:
cqlsh -f
to run some CQL statements and init the schema.We just use the existence of a file to indicate whether the node has already been bootstrapped and check that on startup to determine whether we need to do that logic above or can just start it normally. You can see the results in the killrvideo-dse-docker repository on GitHub.
There is one caveat to this approach. This worked great for us because in our reference application, we're only spinning up a single node (i.e. we aren't creating a cluster with more than one node). If you're running multiple nodes, you'll probably want to make sure that only one of the nodes does the bootstrapping to create the schema because multiple clients modifying the schema simultaneously can cause some issues with your cluster. (This is a known issue and will hopefully be fixed at some point.)
Upvotes: 14