Reputation: 622
I've setup a Mesos cluster using the CloudFormation templates from Mesosphere. Things worked fine after cluster launch.
I recently noticed that none of the slave nodes are listed in the Mesos dashboard. EC2 console shows the slaves are running & pass health checks. I restarted nodes on cluster but that didn't help.
I ssh'ed into one of the slaves and noticed mesos-slave services are not running. Executed sudo systemctl status dcos-mesos-slave.service
but that couldn't start the service.
Looked in /var/log/mesos/
and tail -f mesos-slave.xxx.invalid-user.log.ERROR.20151127-051324.31267
and saw the following...
F1127 05:13:24.242182 31270 slave.cpp:4079] CHECK_SOME(state::checkpoint(path, bootId.get())): Failed to create temporary file: No space left on device
But the output of df -h
and free
show there is plenty of disk space left.
Which leads me to wonder, why is it complaining about no disk space?
Upvotes: 0
Views: 1715
Reputation: 137
It is good practice to run
docker rmi -f $(docker images | grep "<none>" | awk "{print \$3}")
this way you will free space by deleting unused docker images
Upvotes: 0
Reputation: 622
Ok I figured it out.
When running Mesos for a long time or under frequent load, the /tmp
folder won't have any disk space left since Mesos uses the /tmp/mesos/
as the work_dir. You see, the filesystem can only hold a certain number of file references(inodes). In my case, slaves were collecting large number of file chuncks from image pulls in /var/lib/docker/tmp
.
To resolve this issue:
1) Remove files under /tmp
2) Set a different work_dir location
Upvotes: 1