Reputation: 64002
What is the easiest way (that works for both dev' and real environments) to run Hadoop with Docker?
That is for local development environment and real environment difference should be only destination machine.
P.S. related to
and many https://stackoverflow.com/questions/tagged/hadoop+docker
Upvotes: 4
Views: 1590
Reputation: 19184
There are a few Hadoop images on Docker Hub, but if you want something suitable for different environments, you'll want to run fully distributed - i.e. with a container for the HDFS and YARN master node, and multiple containers for the worker nodes.
I have an image which works like that which you can use as a starting point: sixeyed/hadoop-dotnet. You can see from the Dockerfile that it starts from the Java base image, installs Hadoop and uses a startup script so containers can be run as a master or a worker.
That means you can run a distributed cluster with Docker, using the latest networking stack:
docker network create hadoop
docker run -d -p 50070:50070 -p 8088:8088 \
--network hadoop --name hadoop-dotnet-master \
sixeyed/hadoop-dotnet master
docker run -d -p 50075:50075 -p 8142:8042 -p 19888:19888 \
--network hadoop
sixeyed/hadoop-dotnet worker
Or you can run a fully-distributed cluster with a Docker Compose file.
That image also includes .NET Core, but you can cut out that part if you're not using it.
Upvotes: 3