Irrelevant
Irrelevant

Reputation: 198

Complexity of f.seek() in Python

Does f.seek(500000,0) go through all the first 499999 characters of the file before getting to the 500000th? In other words, is f.seek(n,0) of order O(n) or O(1)?

Upvotes: 19

Views: 8028

Answers (2)

Martijn Pieters
Martijn Pieters

Reputation: 1123900

You need to be a bit more specific on what type of object f is.

If f is a normal io module object for a file stored on disk, you have to determine if you are dealing with:

  • The raw binary file object
  • A buffer object, wrapping the raw binary file
  • A TextIO object, wrapping the buffer
  • An in-memory BytesIO or TextIO object

The first option just uses the lseek system call to reposition the file descriptor position. If this call is O(1) depends on the OS and what kind of file system you have. For a Linux system with ext4 filesystem, lseek is O(1).

Buffers just clear the buffer if your seek target is outside of the current buffered region and read in new buffer data. That's O(1) too, but the fixed cost is higher.

For text files, things are more complicated as variable-byte-length codecs and line-ending translation mean you can't always map the binary stream position to a text position without scanning from the start. The implementation doesn't allow for non-zero current-position- or end-relative seeks, and does it's best to minimise how much data is read for absolute seeks. Internal state shared with the text decoder tracks a recent 'safe point' to seek back to and read forward to the desired position. Worst-case this is O(n).

The in-memory file objects are just long, addressable arrays really. Seeking is O(1) because you can just alter the current position pointer value.

There are legion other file-like objects that may or may not support seeking. How they handle seeking is implementation dependent.

Etc. So, it depends.

Upvotes: 33

mksteve
mksteve

Reputation: 13085

It would depend on the implementation of f. However, in normal file-system files, it is O(1).

If python implements f on text files, it could be implemented as O(n), as each character may need to be inspected to manage cr/lf pairs correctly.

  • This would be based on whether f.seek(n,0) gave the same result as a loop of reading chars, and (depending on OS) cr/lf were shrunk to lf or lf expanded to cr/lf

If python implements f on a compressed stream, then the order would b O(n), as decompression may require some working of blocks, and decompression.

Upvotes: 1

Related Questions