Reputation:
I'm trying to select 408 random rows from an Excel file and save them as a new Excel file, so that a row can only be selected once, but I'm getting an IndexError that I don't understand.
Here is my code :
# opening the source excel file
wb1 = xl.load_workbook('Test_low.xlsx')
ws1 = wb1.worksheets[0]
# opening the destination excel file
wb2 = xl.load_workbook('Parsed.xlsx')
ws2 = wb2.active
#number of rows
lastrow = ws1.max_row
#number of columns
lastcol = ws1.max_column
#select 408 random rows
i = 0
items = [*range(1,lastrow+1)]
while i <= 407:
for j in range(1, lastcol):
selected = randint(1,lastrow)
# read row value from source excel file
c = ws1.cell(row = selected,column = j)
# write the read value to destination excel file
ws2.cell(row = i+1,column = j).value = c.value
del items[selected]
i =+ 1
# save destination excel file
wb2.save(str('Parsed.xlxs'))
Here is the error raised:
File "C:\(code file)", line 35, in <module>
del items[selected]
IndexError: list assignment index out of range
Can someone help me understand how "selected" is out of range? I'm still a beginer at Python so I'm a bit lost. Thank you!
Upvotes: 1
Views: 205
Reputation: 393
it was out of range as you were indexing from a list that you were deleting items from. instead simply get a random sample of indexes so we wont need to remove duplicates then simply iterate through all columns of sampled rows and append them to your new excel sheet. (I'm not sure why you were adding one to all your ranges so I removed that).
items = [*range(lastrow)]
selectedRows = random.sample(items, 408)
for i in selectedRows:
for j in range(lastcol):
# read row value from source excel file
c = ws1.cell(row = i,column = j)
# write the read value to destination excel file
ws2.cell(row = i,column = j).value = c.value
Upvotes: 1