Reputation: 539
I have some data stored in Excel tables(.xlsx
), which my current Python script reads them into the memory and uses them for calculations. I will explain my script more with an example.
Say my excel file has this data under a specific column: a = [1,2,3,4,5]
.
I am reading this whole thing into the memory using pandas (pd.read_excel()
) and running my own iterator function to get:
a0 = [1,2,3,4,5]
a1 = [5,1,2,3,4]
a2 = [4,5,1,2,3]
and so on. Basically I am shifting every element by some integer amount. a0, a1
and a2
here appear as lists but they are iterator objects, I don't store them.
As you notice here, a0
is always the same as a
, and I don't really need to store a
in memory because I only need it once, which is what a0
does. So what I am trying to do is having some sort of iterator object to iterate over the excel file directly to capture a0, a1
and a2
as if I were importing a
first and then iterating for a0, a1, a2
over a
.
The reason I am trying to do such a thing is because, the time my script takes for calculations are shorter than what it takes to import the data from Excel. So in order to increase my script in performance, I need to find a way to iterate over Excel rather than saving data into the memory. I would appreciate any help with this.
Addition, my comment: If pandas
or some other library had readThisCell()
kind of functionality, it would make things easy for me to make my own excel iterator. But I don't know what my options are with pandas or any other library.
Upvotes: 2
Views: 6468
Reputation: 27077
I don't have experience with the pandas read_excel
function, but we have had good success with openpyxl. That library lets you define a variable pointing to a specific worksheet and then iterate over that variable, as follows (pulled directly from their tutorial):
from openpyxl import load_workbook
wb = load_workbook(filename='large_file.xlsx', read_only=True)
ws = wb['big_data'] # ws is now an IterableWorksheet
for row in ws.rows:
for cell in row:
print(cell.value)
Upvotes: 1