kadamb
kadamb

Reputation: 1738

openpyxl not able to open more than 65k rows

i have a xlsx file which has 500,000 rows.I want to copy it into a csv file, but I am only able to copy 65k rows and then the program ends.here is my code i am using for copying data from xlsx file to csv file.amd it is taking too much time to even print 65k lines.

import openpyxl
import csv
import time
import collections

def csv_from_excel() :
    t1=(time.time())*1000
    workbook = openpyxl.load_workbook(filename = '5lac_xlsx.xlsx',optimized_read = True, use_iterators = True)
    sh=workbook.get_sheet_by_name('Sheet1')
    your_csv_file = open('your_csv_file.csv','wb')
    wr = csv.writer(your_csv_file,quoting=csv.QUOTE_ALL)
    t2=(time.time())*1000
    print (t2-t1)
    fp = open('your_csv_file.csv', 'wb')
    a = csv.writer(fp, delimiter=',')

    m1=(time.time())*1000
    count=0

    for row_t in sh.iter_rows():
        for cell in row_t :
            try :
                count=count+1
                wr.writerow([cell[3]])  
                #wr.writerow('\n')
            except :
                print "error"

    print "count"
    print count

    your_csv_file.close()

    m2=(time.time())*1000
    print (m2-m1)

csv_from_excel()

Upvotes: 0

Views: 784

Answers (1)

sabbahillel
sabbahillel

Reputation: 4425

A similar problem showed up in converting json to csv which was solved by reading the input file in chunks. The example code uses a small chunk size, while a large file worked with a chunk size of 0x800000

While the sample code is for JSON, you should be able to convert it to handle your Excel file using the same principle, except using Excel processors such as xlrd or one of the other Python Excel packages.

from json import JSONDecoder
from functools import partial


def json_parse(fileobj, decoder=JSONDecoder(), buffersize=2048):
  buffer = ''
  for chunk in iter(partial(fileobj.read, buffersize), ''):
     buffer += chunk
     while buffer:
         try:
             result, index = decoder.raw_decode(buffer)
             yield result
             buffer = buffer[index:]
         except ValueError:
             # Not enough data to decode, read more
             break

This function will read chunks from the given file object in buffersize chunks, and have the decoder object parse whole JSON objects from the buffer. Each parsed object is yielded to the caller.

Use it like this:

with open('yourfilename', 'r') as infh:
  for data in json_parse(infh):
    # process object

Upvotes: 1

Related Questions