Polly
Polly

Reputation: 1097

Splitting a text file

I have a folder with a set of text documents. I want to split each document to two or three documents, each one should be 45-70kb.

How сan I do it? I tried:

def split_file(filename, pattern, size):
    with open(filename, 'rb') as f:
        for index, line in enumerate(f, start=1):
            with open(pattern.format(index), 'wb') as out:
                n=0
                for line in chain([line], f):
                    out.write(line)
                    n += len(line)
                    if n >= 450000 and n <=700000:
                        break
if __name__ == '__main__':
    split_file('folderadress', 'part_{0:03d}.txt', 20000)

but it seems to me it's completely wrong.

Upvotes: 0

Views: 92

Answers (1)

cdarke
cdarke

Reputation: 44354

This uses a different approach to yours. I have set the maximum size for each file to be 1000 bytes for testing purposes:

import glob
import os

dname = './gash'    # directory name
unit_size = 1000    # maximum file size

for fname in glob.iglob("%s/*" % dname):
    with open(fname, 'rb') as fo:
        data = True
        n = 1
        while data:
            # read returns "" (False) on EOF
            data = fo.read(unit_size)
            if data:
                sub_fname = fname + str(n)

                with open(sub_fname, 'wb') as out:
                    out.write(data)

                n += 1

What this might do is to split a line between files, however you do not state if this could be an issue or not.

Upvotes: 2

Related Questions