Reputation: 11
I am attempting to use fio
to verify data on storage after a shutdown,
to this purpose using fio
write with --trigger-file
option to stop fio
operation midway through (and simulate power down).
And then fio
read with --verify_state_load
option to check only portions of data that managed to complete, however verify fails, seems that state_load has no effect (read verify will work correctly as expected if write job is not terminated partially though by trigger).
are there any restriction using trigger/state_load that I must pay attention too?
Write job parameters:
[global]
verify_fatal=1
do_verify=0
loops=1
group_reporting=1
filename=/dev/nvme0n1
cpus_allowed=0-7
cpus_allowed_policy=split
runtime=0
verify=crc32c-intel
direct=1
rw=randwrite
verify_offset=100
ioengine=libaio
iodepth=32
size=200mb
bs=4096
verify_backlog=16384.0
[job_0]
size=209715200
offset=0
[job_1]
size=209715200
offset=1744830464
[job_2]
size=209715200
offset=3489660928
[job_3]
size=209715200
offset=5234491392
[job_4]
size=209715200
offset=6979321856
[job_5]
size=209715200
offset=8724152320
[job_6]
size=209715200
offset=10468982784
[job_7]
size=209715200
offset=12213813248
Read job parameters:
[global]
verify_fatal=1
do_verify=1
loops=1
group_reporting=1
filename=/dev/nvme0n1
cpus_allowed=0-7
verify_state_load=1
cpus_allowed_policy=split
runtime=0
verify=crc32c-intel
direct=1
rw=read
verify_offset=100
ioengine=libaio
iodepth=32
size=1mb
bs=4096
verify_backlog=16384.0
[job_0]
size=1048576
offset=0
[job_1]
size=1048576
offset=1744830464
[job_2]
size=1048576
offset=3489660928
[job_3]
size=1048576
offset=5234491392
[job_4]
size=1048576
offset=6979321856
[job_5]
size=1048576
offset=8724152320
[job_6]
size=1048576
offset=10468982784
[job_7]
size=1048576
offset=1221381324
Errors in read job:
starting 8 processes job_5: No I/O performed by libaio, perhaps try --debug=io option for details? job_4: No I/O performed by libaio, perhaps try --debug=io option for details? job_7: No I/O performed by libaio, perhaps try --debug=io option for details? job_6: No I/O performed by libaio, perhaps try --debug=io option for details? job_3: No I/O performed by libaio, perhaps try --debug=io option for details? job_2: No I/O performed by libaio, perhaps try --debug=io option for details? verify: bad header offset 466944, wanted 20480 at file /dev/nvme0n1 offset 20480, length 4096 verify: bad header offset 462848, wanted 24576 at file /dev/nvme0n1 offset 24576, length 4096 job_0: No I/O performed by libaio, perhaps try --debug=io option for details? fio: pid=1920, err=84/file:io_u.c:1985, func=io_u_queued_complete, error=Invalid or incomplete multibyte or wide character
Upvotes: 0
Views: 1764
Reputation: 7174
Update
I think this ended up being filed by @RonenWeiss over on https://github.com/axboe/fio/issues/468 . Over there an fio maintainer suggested the problem being seen was because the writing of the data was being done with rw=randwrite
but the separate stage "verification" was being done using rw=read
and because fio bases the re-generated data off the job parameters, the second stage is generating different data to the first stage and thus reporting mismatches. If the second stage uses rw=randread
the mismatches should not be present (assuming the data was saved correctly ;-).
Original reply
Hmm, you might be better asking this to the fio folks directly (if you choose to do so take a look at https://github.com/axboe/fio/blob/master/REPORTING-BUGS and make sure you're on a fairly recent version fio and see https://github.com/axboe/fio/releases to get an idea of which versions those might be).
Your jobs look a bit strange (verify_backlog
doesn't take decimals, you could have made size
a global in the read job but you keep repeating it, runtime
is 0 etc.). The thing that immediately stands out is that in your "verification" job you use a different size to the original job which probably won't have a happy ending but I can't say that's the problem for sure.
Unfortunately your jobs are big and complicated which makes spotting the problem difficult (I'm very casual so if I can't see the issue in 30 seconds I normally silently move on). If you are having trouble with an fio job I strongly recommend that you strip the job parameters down to the bare minimum that still make the problem happen (e.g. can you make the problem happen with only two jobs? Only one job? How many options can you remove from the globals such that the problem continues to happen?). That way people like me have one less excuse to avoid diagnosing the problem ;-)
Upvotes: 1
Reputation: 11
My issue seemed to be as I was using fio rw=randwrite to generate data and separate rw=read job to verify data after trigger, using rw=randread to verify data addressed the issue I had
Upvotes: 1