Reputation: 666
I have a Vagrant guest I'm using to run a Symfony 2 application locally for development. In general this is working fine, however, I am regularly finding the processes lock in the 'D+' state (waiting for I/O).
eg. I try to run my unit tests:
./bin/phpunit -c app
The task launches, but then never exits. In the process list I see:
vagrant 3279 0.5 4.9 378440 101132 pts/0 D+ 02:43 0:03 php ./bin/phpunit -c app
The task is unkillable. I need to power cycle the Vagrant guest to get it back again. This seems to happen mostly with PHP command line apps (but it's also the main command line tasks I do, so it might not be relevant).
The syslog reports a hung task:
Aug 20 03:04:40 precise64 kernel: [ 6240.210396] INFO: task php:3279 blocked for more than 120 seconds.
Aug 20 03:04:40 precise64 kernel: [ 6240.211920] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 20 03:04:40 precise64 kernel: [ 6240.212843] php D 0000000000000000 0 3279 3091 0x00000004
Aug 20 03:04:40 precise64 kernel: [ 6240.212846] ffff88007aa13c98 0000000000000082 ffff88007aa13c38 ffffffff810830df
Aug 20 03:04:40 precise64 kernel: [ 6240.212849] ffff88007aa13fd8 ffff88007aa13fd8 ffff88007aa13fd8 0000000000013780
Aug 20 03:04:40 precise64 kernel: [ 6240.212851] ffff88007aa9c4d0 ffff880079e596f0 ffff88007aa13c78 ffff88007fc14040
Aug 20 03:04:40 precise64 kernel: [ 6240.212853] Call Trace:
Aug 20 03:04:40 precise64 kernel: [ 6240.212859] [<ffffffff810830df>] ? queue_work+0x1f/0x30
Aug 20 03:04:40 precise64 kernel: [ 6240.212863] [<ffffffff811170e0>] ? __lock_page+0x70/0x70
Aug 20 03:04:40 precise64 kernel: [ 6240.212866] [<ffffffff8165a55f>] schedule+0x3f/0x60
Aug 20 03:04:40 precise64 kernel: [ 6240.212867] [<ffffffff8165a60f>] io_schedule+0x8f/0xd0
Aug 20 03:04:40 precise64 kernel: [ 6240.212869] [<ffffffff811170ee>] sleep_on_page+0xe/0x20
Aug 20 03:04:40 precise64 kernel: [ 6240.212871] [<ffffffff8165ae2f>] __wait_on_bit+0x5f/0x90
Aug 20 03:04:40 precise64 kernel: [ 6240.212873] [<ffffffff81117258>] wait_on_page_bit+0x78/0x80
Aug 20 03:04:40 precise64 kernel: [ 6240.212875] [<ffffffff8108af00>] ? autoremove_wake_function+0x40/0x40
Aug 20 03:04:40 precise64 kernel: [ 6240.212877] [<ffffffff8111736c>] filemap_fdatawait_range+0x10c/0x1a0
Aug 20 03:04:40 precise64 kernel: [ 6240.212882] [<ffffffff81122a01>] ? do_writepages+0x21/0x40
Aug 20 03:04:40 precise64 kernel: [ 6240.212884] [<ffffffff81118da8>] filemap_write_and_wait_range+0x68/0x80
Aug 20 03:04:40 precise64 kernel: [ 6240.212892] [<ffffffffa01269fe>] nfs_file_fsync+0x5e/0x130 [nfs]
Aug 20 03:04:40 precise64 kernel: [ 6240.212896] [<ffffffff811a632b>] vfs_fsync+0x2b/0x40
Aug 20 03:04:40 precise64 kernel: [ 6240.212900] [<ffffffffa01272c3>] nfs_file_flush+0x53/0x80 [nfs]
Aug 20 03:04:40 precise64 kernel: [ 6240.212903] [<ffffffff81175d6f>] filp_close+0x3f/0x90
Aug 20 03:04:40 precise64 kernel: [ 6240.212905] [<ffffffff81175e72>] sys_close+0xb2/0x120
Aug 20 03:04:40 precise64 kernel: [ 6240.212907] [<ffffffff81664a82>] system_call_fastpath+0x16/0x1b`
To provision the box, I'm sharing a local folder using:
config.vm.synced_folder "/my/local/path.dev", "/var/www", :nfs => true
Vagrant creates the following /etc/exports file on the OSX host:
# VAGRANT-BEGIN: c7d0c56a-a126-46f5-a293-605bf554bc9a
"/Users/djdrey-local/Sites/oddswop.dev" 192.168.33.101 -mapall=501:20
# VAGRANT-END: c7d0c56a-a126-46f5-a293-605bf554bc9a
Output of nfsstat on the vagrant guest
Server rpc stats:
calls badcalls badclnt badauth xdrcall
0 0 0 0 0
Client rpc stats:
calls retrans authrefrsh
87751 0 87751
Client nfs v3:
null getattr setattr lookup access readlink
0 0% 35018 39% 1110 1% 8756 9% 19086 21% 0 0%
read write create mkdir symlink mknod
5100 5% 7059 8% 4603 5% 192 0% 0 0% 0 0%
remove rmdir rename link readdir readdirplus
4962 5% 262 0% 313 0% 0 0% 0 0% 1056 1%
fsstat fsinfo pathconf commit
1 0% 2 0% 1 0% 229 0%
I've ensured the Guest Additions are up to date on the guest using the plugin: vagrant-vbguest
I'm not sure how to go about debugging this. It's pretty clear to me this is a NFS issue between the guest and the Mac OSX host. If I try and up the debug logging for NFS on OSX using NFS Manager, I get a kernel panic in OSX.
Has anyone else had a similar issue? Any suggestions on a way forward would be appreciated - as power cycling the guest several times per day is unworkable.
Environment
Upvotes: 4
Views: 2082
Reputation: 1534
I had a similar problem when running npm install within a shared nfs folder and subsequently found that disabling nfs_udp fixed the hanging issues :
config.vm.synced_folder ".", "/vagrant", type: "nfs", nfs_udp: false
Upvotes: 1
Reputation: 97
You don't give enough detail on the specific configuration (e.g., the exports file, the fstab file, firewall config, etc.) for a specific answer. Here are some ideas though:
In the fstab try adding the "hard,intr" flags to the mount options -- this makes it possible to kill processes waiting for I/O on a dead mount.
Also make sure your firewall is open for rpc calls and the rpc-statd service is running.
Also figure out what version of nfs you're running and that you have the correct TCP/UDP ports open. If NFS v4 isn't working out, maybe try NFS v3.
Finally, are you connecting via IP address or hostname? Hostname is great, but make sure it always resolves correctly -- maybe in your /etc/hosts file. Alternatively, hard-code the IP addresses so there is no chance of name resolution failing...
Upvotes: 0