Paul Verest
Paul Verest

Reputation: 64002

Run Hadoop with Docker (for both DEV and PROD environments)

What is the easiest way (that works for both dev' and real environments) to run Hadoop with Docker?

That is for local development environment and real environment difference should be only destination machine.

P.S. related to

and many https://stackoverflow.com/questions/tagged/hadoop+docker

Upvotes: 4

Views: 1590

Answers (1)

Elton Stoneman
Elton Stoneman

Reputation: 19184

There are a few Hadoop images on Docker Hub, but if you want something suitable for different environments, you'll want to run fully distributed - i.e. with a container for the HDFS and YARN master node, and multiple containers for the worker nodes.

I have an image which works like that which you can use as a starting point: sixeyed/hadoop-dotnet. You can see from the Dockerfile that it starts from the Java base image, installs Hadoop and uses a startup script so containers can be run as a master or a worker.

That means you can run a distributed cluster with Docker, using the latest networking stack:

docker network create hadoop

docker run -d -p 50070:50070 -p 8088:8088 \
    --network hadoop --name hadoop-dotnet-master \
    sixeyed/hadoop-dotnet master

docker run -d -p 50075:50075 -p 8142:8042 -p 19888:19888 \
    --network hadoop 
    sixeyed/hadoop-dotnet worker

Or you can run a fully-distributed cluster with a Docker Compose file.

That image also includes .NET Core, but you can cut out that part if you're not using it.

Upvotes: 3

Related Questions