Marcin
Marcin

Reputation: 454

Functional tests of application running on Spark Streaming with Kafka

I'm setting up functional tests for applications running with Spark Streaming and Kafka. The steps to be done are

  1. Start zookeeper server
  2. Start kafka server
  3. Start message producer to feed kafka with necessary data
  4. Start Spark Streaming application
  5. Wait for 5 minutes
  6. Stop message producer
  7. Stop Spark Streaming application
  8. Stop kafka server
  9. Stop zookeeper server
  10. Validate output

What is the professional way to do this other than simple bash script?

I think this is quite general question not related strictly to Spark Streaming and Kafka. Maybe there are some testing frameworks which support setting up the environment, running multiple processes in parallel and data validation/assertions.

Upvotes: 0

Views: 969

Answers (2)

Christoph Deppisch
Christoph Deppisch

Reputation: 2216

Consider using Citrus (http://citrusframework.org/) test framework which could be the all-in-one test framework for you.

  • Zookeeper access: check
  • Docker integration: check
  • Kafka integration via Apache Camel: check
  • Waiting for x period of time: check
  • Validating outcome: check

Also consider to use Fabric8 Docker Maven plugin (https://github.com/fabric8io/docker-maven-plugin) for setting up the Docker test environment before Citrus tests are executed within same build run.

Here is an example for both tools working together for automated integration testing: https://github.com/christophd/citrus-samples/tree/master/sample-docker

Upvotes: 1

Vlad Vlaskin
Vlad Vlaskin

Reputation: 110

Maybe there are some testing frameworks which support setting up the environment, running multiple processes in parallel and data validation/assertions.

Unfortuanetely there is no all-in-one framework out there.

One-line answer would be: use docker-compose with the simplest unit-testing or gherkin-based framework of your choice.

Considering the steps above as:

  1. Start the env

  2. Generate Kafka messages / Validate

  3. Shut down the env

Docker-Compose would be the best choice for the steps #1 and #3.

version: '2'
services:
  kafka:
    # this container already has zookeeper built in
    image: spotify/kafka
    ports:
      - 2181:2181
      - 9092:9092   
  # its just some mock-spark container, you'll have to replace it with 
  # docker container that can host your spark-app
  spark: 
    image: epahomov/docker-spark:lightweighted
    depends_on:
      - kafka 

The idea of the compose file is that you can start your env with one command:

docker-compose up

And the environment setup will be pretty much portable across dev machines and build servers.

For the step #2 any test framework will do.

The scenario would look like:

  • Start the environment / Make sure its started
  • Start Generating messages
  • Making assertions / Sleep my sweet thread
  • Shut down the env

Talking about frameworks:

  • Scala: Scalatest. There you can have a good spectrum of Async Assertions and parallel processing.

  • Python: Behave (be careful with multiprocessing there) or unit-testing framework such as pytest

Do not let the naming "unit-testing framework" confuse you. Only test environment defines if a test becomes unit, modular, system or integration like, not a tool.

If a person uses unit-test framework and writes there MyZookeeperConnect("192.168.99.100:2181") its not a unit test anymore, even unit test framework can't help it :)

To glue steps #1, #2, #3 together - simple bash would be my choice.

Upvotes: 1

Related Questions