NarūnasK
NarūnasK

Reputation: 4950

Ridiculously slow ZFS

Am I misinterpreting iostat results or is it really writing just 3.06 MB per minute?

# zpool iostat -v 60

               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
zfs-backup   356G   588G    465     72  1.00M  3.11M
  xvdf       356G   588G    465     72  1.00M  3.11M
----------  -----  -----  -----  -----  -----  -----

               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
zfs-backup   356G   588G    568     58  1.26M  3.06M
  xvdf       356G   588G    568     58  1.26M  3.06M
----------  -----  -----  -----  -----  -----  -----

Currently rsync is writing files from the other HDD (ext4). Based on our file characteristics (~50 KB files) it seems that math is correct 3.06 * 1024 / 58 = 54 KB.

For the record:

Server is on the EC2, currently 1 core, 2GB RAM (t2.small), HDD - the cheapest one on amazon. OS - Debian Jessie, zfs-dkms installed from the debian testing repository.

If it's really that slow, then why? Is there a way to improve performance without moving all to SSD and adding 8 GB of RAM? Can it perform well on VPS at all, or was ZFS designed with bare metal in mind?

EDIT

I've added a 5 GB general purpose SSD to be used as ZIL, as it was suggested in the answers. That didn't help much, as ZIL doesn't seem to be used at all. 5 GB should be more than plenty in my use case, as according to the following Oracle article I should have 1/2 of the size of the RAM.

               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
zfs-backup   504G   440G     47     36   272K  2.74M
  xvdf       504G   440G     47     36   272K  2.74M
logs            -      -      -      -      -      -
  xvdg          0  4.97G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----

               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
zfs-backup   504G   440G     44     37   236K  2.50M
  xvdf       504G   440G     44     37   236K  2.50M
logs            -      -      -      -      -      -
  xvdg          0  4.97G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----

EDIT

dd test shows pretty decent speed.

# dd if=/dev/zero of=/mnt/zfs/docstore/10GB_test bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 29.3561 s, 366 MB/s

However iostat output hasn't changed much bandwidth-wise. Note higher number of write operations.

# zpool iostat -v 10
               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
zfs-backup   529G   415G      0     40  1.05K  2.36M
  xvdf       529G   415G      0     40  1.05K  2.36M
logs            -      -      -      -      -      -
  xvdg          0  4.97G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----

               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
zfs-backup   529G   415G      2    364  3.70K  3.96M
  xvdf       529G   415G      2    364  3.70K  3.96M
logs            -      -      -      -      -      -
  xvdg          0  4.97G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----

               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
zfs-backup   529G   415G      0    613      0  4.48M
  xvdf       529G   415G      0    613      0  4.48M
logs            -      -      -      -      -      -
  xvdg          0  4.97G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----

               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
zfs-backup   529G   415G      0    490      0  3.67M
  xvdf       529G   415G      0    490      0  3.67M
logs            -      -      -      -      -      -
  xvdg          0  4.97G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----

               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
zfs-backup   529G   415G      0    126      0  2.77M
  xvdf       529G   415G      0    126      0  2.77M
logs            -      -      -      -      -      -
  xvdg          0  4.97G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----

               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
zfs-backup   529G   415G      0     29    460  1.84M
  xvdf       529G   415G      0     29    460  1.84M
logs            -      -      -      -      -      -
  xvdg          0  4.97G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----

Upvotes: 0

Views: 2625

Answers (2)

user121391
user121391

Reputation: 617

Can it perform well on VPS at all, or was ZFS designed with bare metal in mind?

Yes to both.

Originally it was designed for bare metal, and that is were you naturally get the best performance and full feature set (otherwise you have to trust the underlying storage, for example if writes are really committed to disk when requesting synchronized writes). Although it is quite flexible, as your vdevs can consist of any files or devices you have available - of course, performance can only be as good as the underlying storage.

Some points for consideration:

  • Moving files between different ZFS file systems is always a full copy/remove, not just rearranging of links (does not apply to your case, but may in the future)
  • Sync writing is much much slower than async (ZFS has to wait for every single request to be committed and cannot queue the writes in the usual fashion*), and can only be speed up by moving the ZFS intent log to a dedicated vdev suitable for high write IOPS, low latency and high endurance (in most cases this will be a SLC SSD or similar, but it could be any device different from the devices already in the pool). A system with normal disks that can easily saturate 110 MB/s async might have sync performance of about 0.5 to 10 MB/s (depending on vdevs) without separating the ZIL onto a dedicated SLOG device. Therefore I would not consider your values out of the ordinary.
  • Even with good hardware, ZFS will never be as fast as simpler file systems, because of overhead for flexibility and safety. This was stated from Sun from the beginning and should not surprise you. If you value performance over anything, choose something else.
  • Block size of the file system in question can affect performance, but I do not have reliable test numbers at hand.
  • More RAM will not help you much (over a low threshold of about 1 GB for the system itself), because it is used only as read cache (unless you have deduplication enabled)

Suggestions:

  • Use faster (virtual) disks for your pool
  • Separate the ZIL from your normal pool by using a different (virtual) device, preferably faster than the pool, but even a device of same speed but not linked to the other devices improves your case
  • Use async instead of sync and verify it after your transaction (or at sizeable chunks of it) yourself

*) To be more precise: in general all small sync writes below a certain size are additionally collected in the ZIL before being written to disk from RAM, which happens either every five seconds or about 4 GB, whichever comes first (all those parameters can be modified). This is done because:

  • the writing from RAM to spinning disks every 5 seconds can be continuous and is therefore faster than small writes
  • in case of sudden power loss, the aborted in-flight transactions are stored savely in the ZIL and can be reapplied upon reboot. This works like in a database with a transaction log and guarantees a consistent state of the file system (for old data) and also that no data to be written is los (for new data).

Normally the ZIL resides on the pool itself, which should be protected by using redundant vdevs, making the whole operation very resilient against power loss, disk crashes, bit errors etc. The downside is that the pool disks need to do the random small writes before they can flush the same data to disk in more efficient continuous transfer - therefore it is recommended to move the ZIL onto another device - usually called an SLOG device (Separate LOG device). This can be another disk, but an SSD performs much better at this workload (and will wear out pretty fast, as most transactions are going through it). If you never experience a crash, your SSD will never be read, only written to.

Upvotes: 2

datasage
datasage

Reputation: 19573

This particular problem may be due to a noisy neighbor. Being that its a t2 instance, you will end up with the lowest priority. In this case you can stop/start your instance to get a new host.

Unless you are using instance storage (which is not really an option for t2 instances anyway), all disk writing is done to what are essentially SAN volumes. The network interface to the EBS system is shared by all instances on the same host. The size of the instance will determine the priority of the instance.

If you are writing from one volume to another, you are passing all read and write traffic over the same interface.

There may be other factors at play depending which volume types you have and if you have any CPU credits left on your t2 instance

Upvotes: 1

Related Questions