Reputation: 1686
I am trying to benchmark my Hard drive, this is to say calculate its latency (ms) and throughput (MB/s). To do that, I want to measure the execution time of the function f.write of Python. What I need is to write exactly x bytes to my files. I understand that I need to open my file using
f = open(file_name, 'wb')
Then what I do is
for i in range(blocksize)
f.write(b'\xff')
Howewer, the results I obtain for the throughput (MB/s) is way too low. The latency looks correct. So what I deduced is that when I do the previous lines, I am actually writing more than one byte to the file, I am writing a string containing one byte ... I know that object don't really have size in Python, but is there a way to fix this problem ?
EDIT Ok here is the new code, now the results are unexplicably too high ! The limit in writing for my disk should be 100MB/s, but I have results ten time faster. What's wrong ? import sys import time
f = open("test.txt",'wb+')
def file_write_seq_access(blocksize):
chunk = b'\xff'*4000
for i in range(blocksize//4000):
f.write(chunk)
if __name__ == '__main__':
start_time = time.time()
file_write_seq_access(int(sys.argv[1]))
stop_time = time.time()
diff = stop_time - start_time
print diff, "s"
print (int(sys.argv[1])/diff),"B/s"
Upvotes: 1
Views: 5841
Reputation: 1612
What you need to make results fine is using the low-level I/O for minimizing call time overhead and flushing the buffers, otherwise your writes could become buffered somewhere(for example by OS You use).
from time import perf_counter as time
def write_test(file, block_size, blocks_count):
f = os.open(file, os.O_CREAT|os.O_WRONLY, 0o777) # low-level I/O
took = []
for i in range(blocks_count):
buff = os.urandom(block_size) # get random bytes
start = time()
os.write(f, buff)
os.fsync(f) # force write to disk
t = time() - start
took.append(t)
os.close(f)
return took
That code is a part of my hobby project -- a simplistic tool in Python to benchmark HDDs and SSDs. It is completely open-source and now in alpha stage, though you already could use it, and if interested, participate in development. Hope you'll find some good ideas or maybe even provide yours. Here's the link: https://github.com/thodnev/MonkeyTest
Upvotes: 4
Reputation: 60227
Simply put, Python isn't fast enough for this kind of byte-by-byte writing, and the file buffering and similar adds too much overhead.
What you should do is chunk the operation:
import sys
blocksize = int(sys.argv[1])
chunk = b'\xff'*10000
with open("file.file", "wb") as f:
for _ in range(blocksize // 10000):
f.write(chunk)
Possibly using PyPy
should give a further (very small, possibly negative) speed-up.
Note that the OS will interfere with timings here, so there's going to be a lot of variance. Using C might end up even faster.
After doing some timings, this matches dd
for speed, so you're not going to be getting any faster.
Upvotes: 4