Reputation: 23
I hope you can help me here. I am working on creating a small environment at home for Data Science. I am having trouble understanding how to create the orchestration layer properly (I am also not convinced that the other components of the architecture I have selected are the most appropriated). If anyone has some experience with any of this components and can give me some recommendations I would appreciate greatly.
I am using old computers and laptops to create the environment (cheaper than using the cloud), some of them with NVIDIA GPUs. So here is the architecture I have in mind.
So, here it comes my question: Assuming I develop an algorithm that requires training, and I need to orchestrate a re-training from time to time of the model. How do I perform the retraining automatically? I know I can use nifi (I could use alternatively apache airflow), but the re-training needs to be executed on a GPU-docker container. Can I just simply prepare a docker container with gpu and python and somehow tell Nifi (or airflow) that it needs to execute the operations on that container (I don't even know if is possible to do that).
Another question is, for performing operations on real-time as the data lands. Will using kafka and druid suffice, or should I think of using Spark Streaming? I am looking into executing transformations of data, passing the data through the models, etc. Also potentially sending POST commands to an API depending on the data results.
I am used to work only on development environment (Jupyter), so when it comes to putting things on production, I have lots of gaps on how things work. Hence the purpose of this is to practice how different components work together and practice different technologies (Nifi, Kafka, Druid, etc).
I hope you can help me.
Thanks in advance.
Upvotes: 2
Views: 191
Reputation: 29
To run task in specific container it's easy to use DockerOperator of Apache Airflow. Typically you need to provide CLI to start training, and call this CLI in container through Airflow. Ref: https://airflow.apache.org/docs/apache-airflow-providers-docker/stable/_api/airflow/providers/docker/operators/docker/index.html
Upvotes: 0