Reputation: 1738
i have a xlsx file which has 500,000 rows.I want to copy it into a csv file, but I am only able to copy 65k rows and then the program ends.here is my code i am using for copying data from xlsx file to csv file.amd it is taking too much time to even print 65k lines.
import openpyxl
import csv
import time
import collections
def csv_from_excel() :
t1=(time.time())*1000
workbook = openpyxl.load_workbook(filename = '5lac_xlsx.xlsx',optimized_read = True, use_iterators = True)
sh=workbook.get_sheet_by_name('Sheet1')
your_csv_file = open('your_csv_file.csv','wb')
wr = csv.writer(your_csv_file,quoting=csv.QUOTE_ALL)
t2=(time.time())*1000
print (t2-t1)
fp = open('your_csv_file.csv', 'wb')
a = csv.writer(fp, delimiter=',')
m1=(time.time())*1000
count=0
for row_t in sh.iter_rows():
for cell in row_t :
try :
count=count+1
wr.writerow([cell[3]])
#wr.writerow('\n')
except :
print "error"
print "count"
print count
your_csv_file.close()
m2=(time.time())*1000
print (m2-m1)
csv_from_excel()
Upvotes: 0
Views: 784
Reputation: 4425
A similar problem showed up in converting json to csv which was solved by reading the input file in chunks. The example code uses a small chunk size, while a large file worked with a chunk size of 0x800000
While the sample code is for JSON, you should be able to convert it to handle your Excel file using the same principle, except using Excel processors such as xlrd or one of the other Python Excel packages.
from json import JSONDecoder
from functools import partial
def json_parse(fileobj, decoder=JSONDecoder(), buffersize=2048):
buffer = ''
for chunk in iter(partial(fileobj.read, buffersize), ''):
buffer += chunk
while buffer:
try:
result, index = decoder.raw_decode(buffer)
yield result
buffer = buffer[index:]
except ValueError:
# Not enough data to decode, read more
break
This function will read chunks from the given file object in buffersize chunks, and have the decoder object parse whole JSON objects from the buffer. Each parsed object is yielded to the caller.
Use it like this:
with open('yourfilename', 'r') as infh:
for data in json_parse(infh):
# process object
Upvotes: 1