Reputation: 10565
I have been able to procure 4 physical machines to set up a spark test cluster. The data will be stored in cassandra, computation will be done with spark (sql and data frames). I am planning on using mesos, because, as a developer, I want to do as little infrastructure work as possible.
However, almost all tutorials I have found are from mesophere, using their commercial dcos infrastructure. I was able to configure the dcos cli to use marathon, but one of the mesophere support people told me that it may not work very well.
I was able to get cassandra installed, but marathon tells me that it's status is 'unhealthy.' Spark doesn't even get that far, Marathon tells me that the deployment task is failing, but there are no longs, no error messages, nothing.
Is it just a bad idea to use mesos? Is there an alternative? Any other resources on how to get cassandra and spark running? I don't mind purchasing books.
update: I am running CentOS 7 on all four machines. These machines have over 20 gigs of ram, 12 cpus and about a terrabyte of disk. One of them is setup as the master node (running zookeeper and mesos masters), the remaining machines are slaves/clients.
Upvotes: 0
Views: 399
Reputation: 31479
Well, there are a few good articles on how to install a cluster, like
Unfortunately, you don't give much details on your environment, such as the OS you're using.
Personally, I run Mesos on a CoreOS cluster in a completely dockerized manner, meaning that the Mesos Master and Slaves also run in an container. If you're intrested, have a look at
to see my systemd
setup to run Mesos on CoreOS.
Concerning Spark, there are several way to get it running on Mesos. Have a look at the Spark docs at
to get an idea. Furthermore, you can run Spark-Jobserver in a Docker container, which will then act as client application for your Spark jobs (with REST API etc.). The Dockerfile/the image are available under
To run Cassandra as a framework on Mesos, have a look at
Upvotes: 0