danchester82
danchester82

Reputation: 13

How do I edit this code so it extracts data from a specific sheet from multiple excel files to display in a separate file

I have python code that will take data from multiple excel files and display the selected data in a separate file for further analysis. It works:

import os
import openpyxl

folder = 'D:\My Documents\Code\Data'
output_file = 'D:\My Documents\Code\Data\masterfile.xlsx'

output_wb = openpyxl.Workbook()
output_sheet = output_wb.active
output_sheet.title = 'masterfile'

cells = ['A2', 'B2', 'C2']

for filename in os.listdir(folder):
    if filename.endswith('.xlsx'):
        filename = os.path.join(folder, filename)
        workbook = openpyxl.load_workbook(filename)
        values = [workbook.active[cell].value for cell in cells]
        output_sheet.append(values)

output_wb.save(output_file)

However, it can only take code from the active sheet... Which means if I want to take data from Sheet2 I have to save all the excel files (hundreds of them) with Sheet2 as the active sheet otherwise it wont work. So I thought I would try and set the active sheet in the code itself but it does not work. I have only been learning Python for a month and I am not a coder by trade, but a scientist working with a lot of biological data. This codes does not work and I need help to fix it.

import os
import openpyxl

folder = 'D:\My Documents\Code\Data'
output_file = 'D:\My Documents\Code\Data\masterfile.xlsx'

wb = openpyxl.Workbook()

output_wb = openpyxl.Workbook()
output_sheet = output_wb.active
output_sheet.title = 'masterfile'

cells = ['A2', 'B2', 'C2']

for filename in os.listdir(folder):
    if filename.endswith('.xlsx'):
        filename = os.path.join(folder, filename)
        workbook = openpyxl.load_workbook(filename)
        workbook.active = workbook.sheetnames.index('Sheet2')
        values = [workbook.active[cell].value for cell in cells]
        output_sheet.append(values)

output_wb.save(output_file)

I tried setting the active worksheet as Sheet2 but it only takes data from the active sheet the original excel files are stored under.

I have tried the suggestions in these links but to no avail:

Set the active sheet with openpyxl How to switch between sheets in Excel openpyxl Python to make changes Get sheet by name using openpyxl

Any and all help appreciated.

Upvotes: 1

Views: 92

Answers (1)

moken
moken

Reputation: 6630

You are not missing any modules

A quick tutorial on selecting worksheets

The active sheet is the sheet that had focus when the workbook was saved and can be accessed as

sheet = wb.active

where 'wb' is the Openpyxl workbook object. This is also for selecting the default sheet when creating a new workbook.

If you want any other Sheet in the workbook then you can select it by name;

sheet = wb['<sheet name>']

where <sheet name> is the name (aka 'title' in Openpyxl) of the sheet, noting the names are case sensitive. wb['sheet1'] will return an error if the sheet name is 'Sheet1'.

Further; the names of the sheets are contained in the list wb.sheetnames i.e. this is a list of the names of the sheets only, as strings.
Therefore you could select every sheet in the workbook by;

for sheetname in wb.sheetnames:
    sheet = wb[sheetname]
    ...

but would probably only use this method if you want to filter sheets selected by the title.

The actual worksheet objects are held in the list wb.worksheets.
If you want to work on every worksheet in the workbook you can loop through all the Sheets by;

for sheet in wb.worksheets:
    print(sheet.title)
    # do something with the worksheet object 'sheet' having 'title' ...

therefore with this list a sheet can also be selected using its index as;

sheet = wb.worksheets[X]

where X is the index of the sheet you require.

The Sheets are generally listed in the order they appear in the workbook.

Upvotes: 1

Related Questions