Kaiyuan Zheng
Kaiyuan Zheng

Reputation: 33

Extract continuous cell values from multiple excel files using Python

The aim of my task is firstly to extract values from continuous cells of a single excel file. Then the same extraction method will be performed on the remaining excel files of the same folder until the loop ends

For example,
I want to extract values from row 'A283:A9000' at excel file 1. After the extraction at excel file 1 is finished, the value from row 'A283:A9000' at excel file 2 will be extracted, then the extraction at the same rows 'A283:A9000' will be continued on excel file 3, excel file 4, excel files 5 and so on.

I learn how to extract values from multiple excel files from https://www.youtube.com/watch?v=M7YkQpcB4fg The code works well when the values from non-continuous cells are extracted. However, when I try to use the code to extract value from continuous cells ('A283:A9000') of the same sheet, the code fails.

I know the problem occurs when I try to use the code to extract values from continuous cells of the same sheet, but I am not sure how to fix the code to custom with my case. I think the problem is located at the line (cells = ['C11', 'C15', 'D15', 'C16', 'A283:A9000']).

Could anyone give me help?

Cheers

Here is the code that I have tried.

import os
import openpyxl 

folder = r'C:\PhD study\GIS\Wind_Downscale\test_one'
output_file = 'C:\PhD study\GIS\Wind_Downscale\Wind_data_forecast_time.xlsx'

output_wb = openpyxl.Workbook()
output_sheet = output_wb.active
output_sheet.title = 'Wind Data for Forecast Time'

cells = ['C11', 'C15', 'D15', 'C16', 'A283:A9000']

for filename in os.listdir(folder):
    if filename.endswith('.xlsx'):
        
        file = os.path.join(folder, filename)
        
        workbook = openpyxl.load_workbook(file)
       
   
        values = [workbook.active[cell].value for cell in cells]

        
        output_sheet.append(values)
        
output_wb.save(output_file)

Here is the error messsage:

Traceback (most recent call last):

File C:\\Conda5\\lib\\site-packages\\spyder_kernels\\py3compat.py:356 in compat_exec
exec(code, globals, locals)

File c:\\users\\kxz237.spyder-py3\\temp.py:29
values = \[workbook.active\[cell\].value for cell in cells\]

File c:\\users\\kxz237.spyder-py3\\temp.py:29 in \<listcomp\>
values = \[workbook.active\[cell\].value for cell in cells\]

AttributeError: 'tuple' object has no attribute 'value'

Upvotes: 2

Views: 128

Answers (2)

moken
moken

Reputation: 6639

Yes you are trying to use 'A283:A9000' as a single cells' co-ordinate hence the attribute error.

An alternative is you can treat every element of your 'cells' list as a range
cells = ['C11', 'C15', 'D15', 'C16', 'A283:A9000']
so for each element the code extracts all the cells that the range covers;
for 'C11' that would be just 'C11'
for 'A283:A9000' that would be 'A283', 'A284', 'A285', 'A286', ...

Use the Openpyxl util openpyxl.utils.cell.cols_from_range(<cells element>) on each element in the cells list.

import os
import openpyxl 

folder = r'C:\PhD study\GIS\Wind_Downscale\test_one'
output_file = 'C:\PhD study\GIS\Wind_Downscale\Wind_data_forecast_time.xlsx'

output_wb = openpyxl.Workbook()
output_sheet = output_wb.active
output_sheet.title = 'Wind Data for Forecast Time'

cells = ['C11', 'C15', 'D15', 'C16', 'A283:A9000']

for filename in os.listdir(folder):
    if filename.endswith('.xlsx'):
        file = os.path.join(folder, filename)

        workbook = openpyxl.load_workbook(file)

        #values = [workbook.active[cell].value for cell in cells]

        for rng in cells:  # Each element in 'cells' list 
            ### Get all cells in the elements range
            for allcells in openpyxl.utils.cell.cols_from_range(rng):
                ### allcells is a tuple of all individual cells in the range rng
                for cell in allcells:  # Extract each cell
                    values = workbook.active[cell].value

                    output_sheet.append([values])

output_wb.save(output_file)

Additional details FYI
There are two Openpyxl utilities that will return the individual cells of a range
openpyxl.utils.cell.cols_from_range(range_string) and
openpyxl.utils.cell.rows_from_range(range_string)

Either could be used in this scenario given the range provided is just one column.
However if your range covered two or more columns then the way each would return the individual cells is;
cols_from_range, cell from each row down the first column, then same down next column etc
rows_from_range, cells across all columns in first row, then all cells in all columns in the second row etc.

i.e. a range 'C3:D4' would return
cols_from_range, 'C3', 'C4', 'D3', 'D4'
rows_from_range, 'C3', 'D3', 'C4', 'D4'

Upvotes: 1

Matt Smith
Matt Smith

Reputation: 158

I don't know if openpyxl has a method for reading from cell ranges (what you are referring to as continuous cells) which may be faster. However, notice that your range of cells could also be represented by list of individual cell addresses. So if you first use a list comprehension to build a list of these individual cell addresses, it should work.

Try replacing this line in your code:

cells = [f"A{i}" for i in range(283, 9000+1)]

This will generate a list like ['A283', 'A284' ... 'A9000']

So, the whole thing in your example (assuming you don't actually want those initial cells 'C11', 'C15', 'D15', 'C16' in your list) would be...

import os
import openpyxl 

folder = r'C:\PhD study\GIS\Wind_Downscale\test_one'
output_file = 'C:\PhD study\GIS\Wind_Downscale\Wind_data_forecast_time.xlsx'

output_wb = openpyxl.Workbook()
output_sheet = output_wb.active
output_sheet.title = 'Wind Data for Forecast Time'

cells = [f"A{i}" for i in range(283, 9000+1)]

for filename in os.listdir(folder):
    if filename.endswith('.xlsx'):
        
        file = os.path.join(folder, filename)
        
        workbook = openpyxl.load_workbook(file)
       
   
        values = [workbook.active[cell].value for cell in cells]

        
        output_sheet.append(values)
        
output_wb.save(output_file)

Upvotes: 2

Related Questions