Reputation: 6372
On this website: http://www.patpro.net/blog/index.php/2014/03/19/2628-zfs-primarycache-all-versus-metadata/
The person shows that by switching primarycache
to all or metadata, he gets wildly different read performance when using the antivirus.
However, he also shows the read bandwidth has a vast difference too.
I create 2 brand new datasets, both with primarycache=none and compression=lz4, and I copy in each one a 4.8GB file (2.05x compressratio). Then I set primarycache=all on the first one, and primarycache=metadata on the second one. I cat the first file into /dev/null with zpool iostat running in another terminal. And finally, I cat the second file the same way.
The sum of read bandwidth column is (almost) exactly the physical size of the file on the disk (du output) for the dataset with primarycache=all: 2.44GB. For the other dataset, with primarycache=metadata, the sum of the read bandwidth column is ...wait for it... 77.95GB.
He then says that an anonymous user explained as this:
clamscan reads a file, gets 4k (pagesize?) of data and processes it, then it reads the next 4k, etc.
ZFS, however, cannot read just 4k. It reads 128k (recordsize) by default. Since there is no cache (you've turned it off) the rest of the data is thrown away.
128k / 4k = 32
32 x 2.44GB = 78.08GB
I don't quite understand the anonymous user's explanation. I'm still confused as to why there is such a big difference in the read bandwidth.
So why does this ZFS experiment show that when primarycache
is all, the read bandwidth is 2.44 GB, but when it is just metadata, it's 77.95GB? And what are implications for tuning ZFS? If the person perhaps reduced his recordsize, would he a different result?
What about the claim that ZFS's recordsize is variable?
Upvotes: 1
Views: 5119
Reputation: 56
The test that the blogger, Patrick, ran was to "cat" the 4.8 GB file (compressed to 2.44 GB) to /dev/null and watch how long it took for the file to be read.
The key is that "primarycache=metadata" might as well mean "cache=off," because none of the actual file will be stored in the cache. When "primarycache=all," the system reads the whole file once and stores it in cache (typically RAM, and then an L2 SSD cache when that fills up). When "cat" or "clamscan" look for the file, they can find it there, and it doesn't need to be read again from disk.
As cat writes the file to /dev/null, it doesn't just write it in a single 2.44 GB block, it writes it a little bit at a time, then it checks the cache for the next bit, then it writes a little more, etc.
With cache off, that file will need to be re-read from disk a ridiculous amount of times as it's written to /dev/null (or stdout, wherever) -- that's the logic of "128k/4k = 32".
ZFS writes files on disk in 128k blocks, but the forum posters found that "clamscan" (and "cat", at least on this user's FreeBSD box) processes data in 4k blocks. So, without a cache, each 128k block will have to be served up 32 times instead of just once. (clamscan pulls block #1, 128k large, uses the first 4k; needs block #1 again, since there's no cache it reads the block from disk again; takes the second 4k, throws the rest out; etc.)
The upshot is:
[1] Maybe never do "primarycache=metadata", for any reason.
[2] When block size is mismatched like so, performance issues can result. If clamscan read 128k blocks, there would be no (significant?) difference on the read of a single file. OTOH, if you need the file again shortly after, a cache would still have its data blocks and it woudn't need to be pulled from disk again.
...
Here are some tests inspired by the forum post to illustrate. The examples take place on a zfs dataset, record size set to 128k (the default), primarycache is set to metadata and a 1G dummy file is copied at different block sizes, 128k first, then 4 then 8. (Scroll to the right, I've lined up my copy commands w/ the iostat readout).
Notice how dramatically, when the block sizes are mismatched, the ratio of reads to writes balloons and the read bandwidth takes off.
root@zone1:~# zpool iostat 3
capacity operations bandwidth
pool alloc free read write read write
---------- ----- ----- ----- ----- ----- -----
rpool 291G 265G 0 21 20.4K 130K
rpool 291G 265G 0 0 0 0
rpool 291G 265G 0 515 0 38.9M ajordan@zone1:~/mnt/test$ mkfile 1G test1.tst
rpool 291G 265G 0 1.05K 0 121M
rpool 292G 264G 0 974 0 100M
rpool 292G 264G 0 217 0 26.7M
rpool 292G 264G 0 516 0 58.0M
rpool 292G 264G 0 0 0 0
rpool 292G 264G 0 0 0 0
rpool 292G 264G 0 96 0 619K
rpool 292G 264G 0 0 0 0
rpool 292G 264G 0 0 0 0
rpool 292G 264G 0 0 0 0
rpool 292G 264G 0 0 0 0
rpool 292G 264G 0 0 0 0
rpool 292G 264G 0 0 0 0
rpool 292G 264G 0 0 0 0
rpool 292G 264G 0 0 0 0
rpool 292G 264G 474 0 59.3M 0 ajordan@zone1:~/mnt/test$ dd if=test1.tst of=test1.2 bs=128k
rpool 292G 264G 254 593 31.8M 67.8M
rpool 292G 264G 396 230 49.6M 27.9M
rpool 293G 263G 306 453 38.3M 45.2M 8192+0 records in
rpool 293G 263G 214 546 26.9M 62.0M 8192+0 records out
rpool 293G 263G 486 0 60.8M 0
rpool 293G 263G 211 635 26.5M 72.9M
rpool 293G 263G 384 235 48.1M 29.2M
rpool 293G 263G 0 346 0 37.2M
rpool 293G 263G 0 0 0 0
rpool 293G 263G 0 0 0 0
rpool 293G 263G 0 0 0 0
rpool 293G 263G 0 0 0 0
rpool 293G 263G 0 0 0 0
rpool 293G 263G 0 0 0 0
rpool 293G 263G 0 0 0 0
rpool 293G 263G 0 0 0 0
rpool 293G 263G 1.05K 70 134M 3.52M ajordan@zone1:~/mnt/test$ dd if=test1.tst of=test1.3 bs=4k
rpool 293G 263G 1.45K 0 185M 0
rpool 293G 263G 1.35K 160 173M 10.0M
rpool 293G 263G 1.44K 0 185M 0
rpool 293G 263G 1.31K 180 168M 9.83M
rpool 293G 263G 1.36K 117 174M 9.20M
rpool 293G 263G 1.42K 0 181M 0
rpool 293G 263G 1.26K 120 161M 9.48M
rpool 293G 263G 1.49K 0 191M 0
rpool 293G 263G 1.40K 117 179M 9.23M
rpool 293G 263G 1.36K 159 175M 9.98M
rpool 293G 263G 1.41K 12 180M 158K
rpool 293G 263G 1.23K 167 157M 9.63M
rpool 293G 263G 1.54K 0 197M 0
rpool 293G 263G 1.36K 158 175M 9.70M
rpool 293G 263G 1.42K 151 181M 9.99M
rpool 293G 263G 1.41K 21 180M 268K
rpool 293G 263G 1.32K 132 169M 9.39M
rpool 293G 263G 1.48K 0 189M 0
rpool 294G 262G 1.42K 118 181M 9.32M
rpool 294G 262G 1.34K 121 172M 9.73M
rpool 294G 262G 859 2 107M 10.7K
rpool 294G 262G 1.34K 135 171M 6.83M
rpool 294G 262G 1.43K 0 183M 0
rpool 294G 262G 1.31K 120 168M 9.44M
rpool 294G 262G 1.26K 116 161M 9.11M
rpool 294G 262G 1.52K 0 194M 0
rpool 294G 262G 1.32K 118 170M 9.44M
rpool 294G 262G 1.48K 0 189M 0
rpool 294G 262G 1.23K 170 157M 9.97M
rpool 294G 262G 1.41K 116 181M 9.07M
rpool 294G 262G 1.49K 0 191M 0
rpool 294G 262G 1.38K 123 176M 9.90M
rpool 294G 262G 1.35K 0 173M 0
rpool 294G 262G 1.41K 114 181M 8.86M
rpool 294G 262G 1.29K 155 165M 10.3M
rpool 294G 262G 1.50K 7 192M 89.3K
rpool 294G 262G 1.43K 116 183M 9.03M
rpool 294G 262G 1.52K 0 194M 0
rpool 294G 262G 1.39K 125 178M 10.0M
rpool 294G 262G 1.28K 119 164M 9.52M
rpool 294G 262G 1.54K 0 197M 0
rpool 294G 262G 1.39K 120 178M 9.57M
rpool 294G 262G 1.45K 0 186M 0
rpool 294G 262G 1.37K 133 175M 9.60M
rpool 294G 262G 1.38K 173 176M 10.1M
rpool 294G 262G 1.61K 0 207M 0
rpool 294G 262G 1.47K 125 189M 10.2M
rpool 294G 262G 1.56K 0 200M 0
rpool 294G 262G 1.38K 124 177M 10.2M
rpool 294G 262G 1.37K 145 175M 9.95M
rpool 294G 262G 1.51K 28 193M 359K
rpool 294G 262G 1.32K 171 169M 10.1M
rpool 294G 262G 1.55K 0 199M 0
rpool 294G 262G 1.29K 119 165M 9.48M
rpool 294G 262G 1.11K 110 142M 8.36M
rpool 294G 262G 1.43K 0 183M 0
rpool 294G 262G 1.36K 118 174M 9.32M
rpool 294G 262G 1.49K 0 190M 0
rpool 294G 262G 1.35K 118 173M 9.32M
rpool 294G 262G 1.32K 146 169M 10.1M
rpool 294G 262G 1.07K 29 137M 363K 262144+0 records in
rpool 294G 262G 0 79 0 4.65M 262144+0 records out
rpool 294G 262G 0 0 0 0
rpool 294G 262G 0 0 0 0
rpool 294G 262G 0 0 0 0
rpool 294G 262G 0 0 0 0
rpool 294G 262G 0 0 0 0
rpool 294G 262G 0 0 0 0
root@zone1:~# zpool iostat 3
capacity operations bandwidth
pool alloc free read write read write
---------- ----- ----- ----- ----- ----- -----
rpool 292G 264G 0 21 22.6K 130K
rpool 292G 264G 0 0 0 0
rpool 292G 264G 0 0 0 0
rpool 292G 264G 1.03K 0 131M 0 ajordan@zone1:~/mnt/test$ dd if=test8k.tst of=test8k.2 bs=8k
rpool 292G 264G 1.10K 202 141M 16.4M
rpool 292G 264G 1.25K 25 161M 316K
rpool 292G 264G 960 215 120M 15.5M
rpool 292G 264G 1.25K 0 160M 0
rpool 292G 264G 1K 210 128M 14.8M
rpool 292G 264G 1010 159 126M 14.3M
rpool 292G 264G 1.28K 0 164M 0
rpool 292G 264G 1.08K 169 138M 15.6M
rpool 292G 264G 1.25K 0 161M 0
rpool 292G 264G 1.00K 166 128M 15.3M
rpool 293G 263G 998 201 125M 15.1M
rpool 293G 263G 1.19K 0 153M 0
rpool 293G 263G 655 161 82.0M 14.2M
rpool 293G 263G 1.27K 0 162M 0
rpool 293G 263G 1.02K 230 130M 12.7M
rpool 293G 263G 1.02K 204 130M 15.5M
rpool 293G 263G 1.23K 0 157M 0
rpool 293G 263G 1.11K 162 142M 14.8M
rpool 293G 263G 1.26K 0 161M 0
rpool 293G 263G 1.01K 168 130M 15.5M
rpool 293G 263G 1.04K 215 133M 15.5M
rpool 293G 263G 1.30K 0 167M 0
rpool 293G 263G 1.01K 210 129M 16.1M
rpool 293G 263G 1.24K 0 159M 0
rpool 293G 263G 1.10K 214 141M 15.3M
rpool 293G 263G 1.07K 169 137M 15.6M
rpool 293G 263G 1.25K 0 160M 0
rpool 293G 263G 1.01K 166 130M 15.0M
rpool 293G 263G 1.25K 0 160M 0
rpool 293G 263G 974 230 122M 15.8M
rpool 293G 263G 1.11K 160 142M 14.4M
rpool 293G 263G 1.26K 0 161M 0
rpool 293G 263G 1.06K 172 136M 15.8M
rpool 293G 263G 1.27K 0 162M 0
rpool 293G 263G 1.07K 167 136M 15.4M
rpool 293G 263G 1011 217 126M 15.8M
rpool 293G 263G 1.22K 0 156M 0
rpool 293G 263G 569 160 71.2M 14.6M 131072+0 records in
rpool 293G 263G 0 0 0 0 131072+0 records out
rpool 293G 263G 0 98 0 1.09M
rpool 293G 263G 0 0 0 0
rpool 293G 263G 0 0 0 0
rpool 293G 263G 0 0 0 0
Upvotes: 4