Andre Lackmann
Andre Lackmann

Reputation: 666

NFS hang from Vagrant guest to OSX

I have a Vagrant guest I'm using to run a Symfony 2 application locally for development. In general this is working fine, however, I am regularly finding the processes lock in the 'D+' state (waiting for I/O).

eg. I try to run my unit tests:

./bin/phpunit -c app

The task launches, but then never exits. In the process list I see:

vagrant 3279 0.5 4.9 378440 101132 pts/0 D+ 02:43 0:03 php ./bin/phpunit -c app

The task is unkillable. I need to power cycle the Vagrant guest to get it back again. This seems to happen mostly with PHP command line apps (but it's also the main command line tasks I do, so it might not be relevant).

The syslog reports a hung task:

Aug 20 03:04:40 precise64 kernel: [ 6240.210396] INFO: task php:3279 blocked for more than 120 seconds.
Aug 20 03:04:40 precise64 kernel: [ 6240.211920] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 20 03:04:40 precise64 kernel: [ 6240.212843] php             D 0000000000000000     0  3279   3091 0x00000004
Aug 20 03:04:40 precise64 kernel: [ 6240.212846]  ffff88007aa13c98 0000000000000082 ffff88007aa13c38 ffffffff810830df
Aug 20 03:04:40 precise64 kernel: [ 6240.212849]  ffff88007aa13fd8 ffff88007aa13fd8 ffff88007aa13fd8 0000000000013780
Aug 20 03:04:40 precise64 kernel: [ 6240.212851]  ffff88007aa9c4d0 ffff880079e596f0 ffff88007aa13c78 ffff88007fc14040
Aug 20 03:04:40 precise64 kernel: [ 6240.212853] Call Trace:
Aug 20 03:04:40 precise64 kernel: [ 6240.212859]  [<ffffffff810830df>] ? queue_work+0x1f/0x30
Aug 20 03:04:40 precise64 kernel: [ 6240.212863]  [<ffffffff811170e0>] ? __lock_page+0x70/0x70
Aug 20 03:04:40 precise64 kernel: [ 6240.212866]  [<ffffffff8165a55f>] schedule+0x3f/0x60
Aug 20 03:04:40 precise64 kernel: [ 6240.212867]  [<ffffffff8165a60f>] io_schedule+0x8f/0xd0
Aug 20 03:04:40 precise64 kernel: [ 6240.212869]  [<ffffffff811170ee>] sleep_on_page+0xe/0x20
Aug 20 03:04:40 precise64 kernel: [ 6240.212871]  [<ffffffff8165ae2f>] __wait_on_bit+0x5f/0x90
Aug 20 03:04:40 precise64 kernel: [ 6240.212873]  [<ffffffff81117258>] wait_on_page_bit+0x78/0x80
Aug 20 03:04:40 precise64 kernel: [ 6240.212875]  [<ffffffff8108af00>] ? autoremove_wake_function+0x40/0x40
Aug 20 03:04:40 precise64 kernel: [ 6240.212877]  [<ffffffff8111736c>] filemap_fdatawait_range+0x10c/0x1a0
Aug 20 03:04:40 precise64 kernel: [ 6240.212882]  [<ffffffff81122a01>] ? do_writepages+0x21/0x40
Aug 20 03:04:40 precise64 kernel: [ 6240.212884]  [<ffffffff81118da8>] filemap_write_and_wait_range+0x68/0x80
Aug 20 03:04:40 precise64 kernel: [ 6240.212892]  [<ffffffffa01269fe>] nfs_file_fsync+0x5e/0x130 [nfs]
Aug 20 03:04:40 precise64 kernel: [ 6240.212896]  [<ffffffff811a632b>] vfs_fsync+0x2b/0x40
Aug 20 03:04:40 precise64 kernel: [ 6240.212900]  [<ffffffffa01272c3>] nfs_file_flush+0x53/0x80 [nfs]
Aug 20 03:04:40 precise64 kernel: [ 6240.212903]  [<ffffffff81175d6f>] filp_close+0x3f/0x90
Aug 20 03:04:40 precise64 kernel: [ 6240.212905]  [<ffffffff81175e72>] sys_close+0xb2/0x120
Aug 20 03:04:40 precise64 kernel: [ 6240.212907]  [<ffffffff81664a82>] system_call_fastpath+0x16/0x1b`

To provision the box, I'm sharing a local folder using:

config.vm.synced_folder "/my/local/path.dev", "/var/www", :nfs => true

Vagrant creates the following /etc/exports file on the OSX host:

# VAGRANT-BEGIN: c7d0c56a-a126-46f5-a293-605bf554bc9a
"/Users/djdrey-local/Sites/oddswop.dev" 192.168.33.101 -mapall=501:20
# VAGRANT-END: c7d0c56a-a126-46f5-a293-605bf554bc9a

Output of nfsstat on the vagrant guest

Server rpc stats:
calls      badcalls   badclnt    badauth    xdrcall
0          0          0          0          0

Client rpc stats:
calls      retrans    authrefrsh
87751      0          87751

Client nfs v3:
null         getattr      setattr      lookup       access       readlink
0         0% 35018    39% 1110      1% 8756      9% 19086    21% 0         0%
read         write        create       mkdir        symlink      mknod
5100      5% 7059      8% 4603      5% 192       0% 0         0% 0         0%
remove       rmdir        rename       link         readdir      readdirplus
4962      5% 262       0% 313       0% 0         0% 0         0% 1056      1%
fsstat       fsinfo       pathconf     commit
1         0% 2         0% 1         0% 229       0%

I've ensured the Guest Additions are up to date on the guest using the plugin: vagrant-vbguest

I'm not sure how to go about debugging this. It's pretty clear to me this is a NFS issue between the guest and the Mac OSX host. If I try and up the debug logging for NFS on OSX using NFS Manager, I get a kernel panic in OSX.

Has anyone else had a similar issue? Any suggestions on a way forward would be appreciated - as power cycling the guest several times per day is unworkable.

Environment

Upvotes: 4

Views: 2082

Answers (2)

Ben Dyer
Ben Dyer

Reputation: 1534

I had a similar problem when running npm install within a shared nfs folder and subsequently found that disabling nfs_udp fixed the hanging issues :

 config.vm.synced_folder ".", "/vagrant", type: "nfs", nfs_udp: false

Upvotes: 1

Steve Moon
Steve Moon

Reputation: 97

You don't give enough detail on the specific configuration (e.g., the exports file, the fstab file, firewall config, etc.) for a specific answer. Here are some ideas though:

In the fstab try adding the "hard,intr" flags to the mount options -- this makes it possible to kill processes waiting for I/O on a dead mount.

Also make sure your firewall is open for rpc calls and the rpc-statd service is running.

Also figure out what version of nfs you're running and that you have the correct TCP/UDP ports open. If NFS v4 isn't working out, maybe try NFS v3.

Finally, are you connecting via IP address or hostname? Hostname is great, but make sure it always resolves correctly -- maybe in your /etc/hosts file. Alternatively, hard-code the IP addresses so there is no chance of name resolution failing...

Upvotes: 0

Related Questions