Looking for ideas for efficient way to split large text file by number of lines in python

Question

Iam currently trying to split the large file >200GB. Goal is to divide the large file into smaller chunks. I have written following code and it works great on smaller file. However on larger file my computer is restarting. At this point i can't figure out if it is my hardware issuse(i.e. processing power) or some other reason. Also looking for ideas if there is efficient way of doing same thing.

  def split(source, target, lines):
      index = 0
      block = 0
      if not os.path.exists(target):
          os.mkdir(target)
      with open(source, 'rb') as s:
          chunk = s.readlines()
          while block < len(chunk):
              with open(target+(f'file_{index:04d}.txt'), 'wb') as t:
                  t.writelines(chunk[block: block+lines])
              index+=1
              block+=lines

Looking for ideas for efficient way to split large text file by number of lines in python

Answers (1)

Related Questions