Reputation: 87
I have an Excel file with items and descriptions and I'm trying to compare descriptions for similarity, and if they're similar, put them in a new Excel file. Those items also have Catalog #'s and I'm comparing them to see if they're nothing like each other and they're from the same vendor (buy_line) put them also on the new Excel file. When I run the file, it takes way too long and after I leave it running, I come back and Spyder is closed and no new file. So this is a 2 part question, is there a way to make the code faster? and why is there no file created? Thank you in advance. My code is below
`import xlrd
import xlsxwriter
from fuzzywuzzy import fuzz
AllItems = xlrd.open_workbook('2-18All_Items-CleanUp.xlsx','rb')
sheets = AllItems.sheet_names()
item = []
base = []
kit = []
buy_line = []
catalogs = []
descriptions = []
similar_desc_item = []
similar_desc = []
diff_catalog_samebuyline = []
sh = AllItems.sheet_by_index(0)
def readexcelfunc():
for rownum in range(sh.nrows):
row_values = sh.row_values(rownum)
item.append((row_values[0]))
base.append((row_values[1]))
kit.append((row_values[2]))
buy_line.append((row_values[6]))
catalogs.append((row_values[8]))
descriptions.append((row_values[12]))
def check_similar_desc():
for i,k in enumerate(descriptions):
for j,l in enumerate(descriptions):
ratio1 = fuzz.token_sort_ratio(k,l)
if ratio1 > 95 and k != l and base[i] != base[j] and kit[i] == "No":
similar_desc_item.append(item[i])
def check_notmatching_catalog():
for x,a in enumerate(catalogs):
for y,b in enumerate(catalogs):
ratio2 = fuzz.token_sort_ratio(a,b)
if ratio2 < 10 and buy_line[x] == buy_line[y]:
diff_catalog_samebuyline.append(catalogs[x])
def Create_ExcelFile():
NewWorkbook = xlsxwriter.Workbook('Sim_Desc.xlsx')
worksheet = NewWorkbook.add_worksheet()
row1 = 0
row2 = 0
for items in similar_desc_item:
worksheet.write(row1,0,items)
row1 += 1
for catalognumb in diff_catalog_samebuyline:
worksheet.write(row2,3,catalognumb)
NewWorkbook.save()
NewWorkbook.close()
readexcelfunc()
check_similar_desc()
print (similar_desc_item)
check_notmatching_catalog()
Create_ExcelFile()
print("Finished")`
Upvotes: 1
Views: 3319
Reputation: 41644
There are a few issues in the Create_ExcelFile()
function. The first is that there is no workbook save()
method. Also you aren't incrementing row2, so the second write()
will alway write to the first row, and overwrite whatever else is there. However, most importantly, the close()
method is at the wrong level so you are closing the file too early. Something like this should work:
def Create_ExcelFile():
NewWorkbook = xlsxwriter.Workbook('Sim_Desc.xlsx')
worksheet = NewWorkbook.add_worksheet()
row1 = 0
row2 = 0
for items in similar_desc_item:
worksheet.write(row1,0,items)
row1 += 1
for catalognumb in diff_catalog_samebuyline:
worksheet.write(row2,3,catalognumb)
# Fix the row2 increment!!
NewWorkbook.close()
Upvotes: 1