Reputation: 35
I am (trying to!) learn Cloud Foundry using bosh-lite on a MacBook Pro. I manage to get it running however every time after starting from scratch it stops working, I suspect this is associated with stopping the [virtualbox] VM / putting the laptop to sleep, but can't confirm if this is definitely the case.
My experience is limited and I'm having difficulties in not just resolving the issue, but also in understanding what is going wrong. Apologies if this is an obvious problem, but I haven't been able to clearly determine how to stop this problem from happening, and the only solution I've had so far is to destroy the deployment using Vagrant and then starting from scratch - which takes a while and isn't the optimal fix I'm sure. :)
I've noticed that the 'bosh vms' show unresponsive agents and that they're not staring properly. The error in bosh cck indicates a locking issue, but I suspect that this may be a misnomer as running bosh locks indicates that there are no locks. Once again, I'm a newbie, so this may simply be a misunderstanding ...
Help - how do I fix this!! Is there a way to quickly 'reset' to a working state? (vagrant reload --provision doesn't help) Where exactly is the issue?
Also, what is the (default) root password for the vagrant cloudfoundry/bosh-lite VM?
> bosh vms
+---------------------------------------------------------------------------+--------------------+-----+-----------+--------------+
| VM | State | AZ | VM Type | IPs |
+---------------------------------------------------------------------------+--------------------+-----+-----------+--------------+
| api_z1/0 (8dfeb143-59b1-46dd-9482-e90931a70a0d) | unresponsive agent | n/a | large_z1 | 10.244.0.138 |
| blobstore_z1/0 (7795ce02-d64e-4cc7-be1e-0e328384d568) | unresponsive agent | n/a | medium_z1 | 10.244.0.130 |
| consul_z1/0 (e92f6bfd-f623-4ba4-abf3-3d4baa0953fa) | unresponsive agent | n/a | small_z1 | 10.244.0.54 |
| doppler_z1/0 (049eaa18-3d4f-48d8-92ed-ea4b6a20cd29) | unresponsive agent | n/a | medium_z1 | 10.244.0.146 |
| etcd_z1/0 (e45a7648-e43d-4753-8a18-3ab21b86293d) | unresponsive agent | n/a | large_z1 | 10.244.0.42 |
| ha_proxy_z1/0 (ba6e8ce6-8f40-4868-8a71-c74119f173ea) | failing | n/a | router_z1 | 10.244.0.34 |
| hm9000_z1/0 (ff8ae6a3-1889-4fb0-aabf-072012cf9f48) | unresponsive agent | n/a | medium_z1 | 10.244.0.142 |
| loggregator_trafficcontroller_z1/0 (8f2e4ea1-dda7-4d15-9050-528338824e3b) | unresponsive agent | n/a | small_z1 | 10.244.0.150 |
| nats_z1/0 (9e4eab32-ac91-4f05-83be-b8189c2991e7) | unresponsive agent | n/a | medium_z1 | 10.244.0.6 |
| postgres_z1/0 (fb8d1eee-3ade-480e-aa01-3db26a64b447) | unresponsive agent | n/a | medium_z1 | 10.244.0.30 |
| router_z1/0 (f9ce017b-580f-4fce-b79d-01ceef190e19) | unresponsive agent | n/a | router_z1 | 10.244.0.22 |
| runner_z1/0 (c0b0871b-c672-46c8-ac4a-1aabd81864f6) | unresponsive agent | n/a | runner_z1 | 10.244.0.26 |
| uaa_z1/0 (63b4bfa7-499d-4dba-93f6-2017b04a7588) | unresponsive agent | n/a | medium_z1 | 10.244.0.134 |
+---------------------------------------------------------------------------+--------------------+-----+-----------+--------------+
> bosh cck
Acting as user 'admin' on deployment 'cf-warden' on 'Bosh Lite Director'
Performing cloud check...
Director task 96
Error 100: Unable to get deployment lock, maybe a deployment is in progress. Try again later.
Task 96 error
For a more detailed error report, run: bosh task 96 --debug
> bosh locks
Acting as user 'admin' on 'Bosh Lite Director'
No locks
It is possible to do a 'reset' and get it up and running again using the commands below, but this takes quite some time and is surely more of a 'hammer' than is required!
# bosh-lite dir
vagrant destroy && vagrant up
# cd cf-release dir
bosh upload release
bosh deploy
# cd bosh-lite dir
bin/add-route
cf api --skip-ssl-validation https://api.bosh-lite.com
cf create-org my_org
cf create-space development -o my_org
Upvotes: 0
Views: 1358
Reputation: 1
It is recommended that we pause the Bosh-lite VM when its not in use so that it can simply be resumed after the system goes to sleep/get rebooted; otherwise VM will be halted by OS (Bosh-lite VM goes in aborted state). Running vagrant up
on aborted bosh-lite, gets it running but in that case CF VMs go in unresponsive state which requires redeployment.
Running vagrant suspend
for pausing and vagrant resume
when restarting the work helps avoid the situation with unresponsive CF VMs.
Upvotes: 0
Reputation: 66
I usually do vagrant suspend
and then vagrant up
to avoid a situation with dead containers/VMs inside BOSH Lite.
You can do bosh cck
but my experience shows that a simple deployment recreate is much faster and also more reliable.
Upvotes: 0
Reputation: 1485
You can use sudo su
after ssh'ing into the bosh-lite VM with vagrant ssh
to become root without needing to enter a root password.
BOSH-lite has always been hard to resurrect after a VM reboot/sleep.
Someone recently (Dec 2016) wrote a utility to "gracefully put machines running BOSH Lite to sleep" and restore it on system wake, to address it:
https://github.com/henryaj/ambient
Upvotes: 0