JahMyst
JahMyst

Reputation: 1686

Python - Benchmarking Disk - Write exactly x bytes in a file

I am trying to benchmark my Hard drive, this is to say calculate its latency (ms) and throughput (MB/s). To do that, I want to measure the execution time of the function f.write of Python. What I need is to write exactly x bytes to my files. I understand that I need to open my file using

f = open(file_name, 'wb')

Then what I do is

for i in range(blocksize)
    f.write(b'\xff')

Howewer, the results I obtain for the throughput (MB/s) is way too low. The latency looks correct. So what I deduced is that when I do the previous lines, I am actually writing more than one byte to the file, I am writing a string containing one byte ... I know that object don't really have size in Python, but is there a way to fix this problem ?

EDIT Ok here is the new code, now the results are unexplicably too high ! The limit in writing for my disk should be 100MB/s, but I have results ten time faster. What's wrong ? import sys import time

f = open("test.txt",'wb+')

def file_write_seq_access(blocksize):
    chunk = b'\xff'*4000
    for i in range(blocksize//4000):
        f.write(chunk)

if __name__ == '__main__':
    start_time = time.time()
    file_write_seq_access(int(sys.argv[1]))
    stop_time = time.time()
    diff = stop_time - start_time 
    print diff, "s"
    print (int(sys.argv[1])/diff),"B/s" 

Upvotes: 1

Views: 5841

Answers (2)

thodnev
thodnev

Reputation: 1612

What you need to make results fine is using the low-level I/O for minimizing call time overhead and flushing the buffers, otherwise your writes could become buffered somewhere(for example by OS You use).

from time import perf_counter as time

def write_test(file, block_size, blocks_count):
    f = os.open(file, os.O_CREAT|os.O_WRONLY, 0o777) # low-level I/O

    took = []
    for i in range(blocks_count):
        buff = os.urandom(block_size) # get random bytes
        start = time()
        os.write(f, buff)
        os.fsync(f) # force write to disk
        t = time() - start
        took.append(t)

    os.close(f)
    return took

That code is a part of my hobby project -- a simplistic tool in Python to benchmark HDDs and SSDs. It is completely open-source and now in alpha stage, though you already could use it, and if interested, participate in development. Hope you'll find some good ideas or maybe even provide yours. Here's the link: https://github.com/thodnev/MonkeyTest

Upvotes: 4

Veedrac
Veedrac

Reputation: 60227

Simply put, Python isn't fast enough for this kind of byte-by-byte writing, and the file buffering and similar adds too much overhead.

What you should do is chunk the operation:

import sys

blocksize = int(sys.argv[1])

chunk = b'\xff'*10000
with open("file.file", "wb") as f:
    for _ in range(blocksize // 10000):
        f.write(chunk)

Possibly using PyPy should give a further (very small, possibly negative) speed-up.

Note that the OS will interfere with timings here, so there's going to be a lot of variance. Using C might end up even faster.


After doing some timings, this matches dd for speed, so you're not going to be getting any faster.

Upvotes: 4

Related Questions