Python: Seeking to EOL in file not working

Question

I have this method:

def get_chunksize(path):
    """
    Breaks a file into chunks and yields the chunk sizes.
    Number of chunks equals the number of available cores.
    Ensures that each chunk ends at an EOL.
    """
    size = os.path.getsize(path)
    cores = mp.cpu_count()
    chunksize = size/cores # gives truncated integer

    f = open(path)
    while 1:
        start = f.tell()
        f.seek(chunksize, 1) # Go to the next chunk
        s = f.readline() # Ensure the chunk ends at the end of a line
        yield start, f.tell()-start
        if not s:
            break

It is supposed to break a file into chunks and return the start of the chunk (in bytes) and the chunk size.

Crucially, the end of a chunk should correspond to the end of a line (which is why the f.readline() behaviour is there), but I am finding that my chunks are not seeking to an EOL at all.

The purpose of the method is to then read chunks which can be passed to a csv.reader instance (via StringIO) for further processing.

I've been unable to spot anything obviously wrong with the function...any ideas why it is not moving to the EOL?

I came up with this rather clunky alternative:

def line_chunker(path):
    size = os.path.getsize(path)
    cores = mp.cpu_count()
    chunksize = size/cores # gives truncated integer

    f = open(path)

    while True:
        part = f.readlines(chunksize)
        yield csv.reader(StringIO("".join(part)))
        if not part:
            break

This will split the file into chunks with a csv reader for each chunk, but the last chunk is always empty (??) and having to join the list of strings back together is rather clunky.

Python: Seeking to EOL in file not working

Answers (1)

Related Questions