How to iterate thorugh multiple excel sheets using openpyxl library in Python?

Question

I am using Openpyxl library to read xlsx file and extract few contents and add more strings to my txt output file.The excel file I am currently using contain sheets with name Summary and Employee. My below code is working fine for my current excel file. Now the issue is I would to use the same code for reading another excel file containing more sheets whose sheetnames I am not sure of. So in my code line ws = wb['Employee']. The sheetname will change all the time. However, One thing I am sure about is I don't want to read any data from sheet1. All the data extraction will occur from sheet2 onwards in all the xlsx files. I am not sure how to proceed from here so any help will be appreciated.

Thanks in advance for your time and efforts!

Code:

from openpyxl import load_workbook

data_file='\test.xlsx'

# Load the entire workbook.
wb = load_workbook(data_file)
ws = wb['Employee'] #Manually adding sheet name here


mylines={"Column_name":[],"Column_Type":[]} #Getting 2 columns data from row 6

type_strs = {
    'String': 'VARCHAR(256)',
    'Numeric': 'NUMBER',
    'Date': 'NUMBER(4,0)',
    'Int': 'NUMBER'
}

for index, value in enumerate(mylines["Column_Type"]):
    mylines["Column_Type"][index] = type_strs.get(value, value)

    
for i in range(6, ws.max_row+1):  
        name = ws.cell(row=i, column=1).value
        name1=ws.cell(row=i, column=2).value
        mylines["Column_name"].append(name) #Appending dictionary key "Column_name"
        mylines["Column_Type"].append(name1) #Appending dictionay key "Column_type"
        for index, value in enumerate(mylines["Column_Type"]):
            mylines["Column_Type"][index] = type_strs.get(value, value)
        
        
        
theString = " "
for i in range(len(mylines['Column_name'])):
    theString += mylines['Column_name'][i] + " " + mylines['Column_Type'][i]
    if i < len(mylines['Column_name'])-1:
        theString += ", "


outputFile = open('/output.txt', 'w')  # Text file Output
outputFile.write("CREATE TABLE TRANSIENT TABLE STG_EMPLOYEE({});".format(theString) + "
")

outputFile.close() #Closing file

Updated Code based on SO User comment:

from openpyxl import load_workbook

data_file='\test.xlsx'

# Load the entire workbook.
wb = load_workbook(data_file)
#ws = wb['Employee'] #Manually adding sheet name here


mylines={"Column_name":[],"Column_Type":[]} #Getting 2 columns data from row 6

type_strs = {
    'String': 'VARCHAR(256)',
    'Numeric': 'NUMBER',
    'Date': 'NUMBER(4,0)',
    'Int': 'NUMBER'
}

for index, value in enumerate(mylines["Column_Type"]):
    mylines["Column_Type"][index] = type_strs.get(value, value)

skip = True
for ws in wb.worksheets:
    if skip == True:
        skip = False
    else:   
        for i in range(6, ws.max_row+1):  
            name = ws.cell(row=i, column=1).value
            name1=ws.cell(row=i, column=2).value
            mylines["Column_name"].append(name) #Appending dictionary key "Column_name"
            mylines["Column_Type"].append(name1) #Appending dictionay key "Column_type"
        for index, value in enumerate(mylines["Column_Type"]):
            mylines["Column_Type"][index] = type_strs.get(value, value)
        
        
        
theString = " "
for i in range(len(mylines['Column_name'])):
    theString += mylines['Column_name'][i] + " " + mylines['Column_Type'][i]
    if i < len(mylines['Column_name'])-1:
        theString += ", "


outputFile = open('/output.txt', 'w')  # Text file Output
outputFile.write("CREATE TABLE TRANSIENT TABLE STG_EMPLOYEE({});".format(theString) + "
")

outputFile.close() #Closing file

Excel data

 Summary Sheet: Empty
 Employee Sheet 
                File Name:  Employee
                Sheet Name: Employee
                File Type:  csv
    
                Field Name  Type
                   Name     String
            Salary  Numeric
            Date    Date
            Phone       Int

   Employee1 Sheet
            File Name:  Employee
            Sheet Name: Employee1
            File Type:  csv
    
            Field Name  Type
            Employee Name   Date
            Employee Salary Int
            Employment Date Int
            Office Phone    Int

Redox · Accepted Answer

To iterate through all worksheets in a workbook and read data in them (except the first worksheet, remove the ws = wb['Employee']

Use a for loop (insert before for i in range(5,... as this

skip = True
for ws in wb.worksheets:
    if skip == True:
        skip = False
    else:
        for i in range(6, ws.max_row+1):
            name = ws.cell(row=i, column=1).value
            ....

This will read each sheet and append data to mylines, except the first sheet

Second Update As you mentioned in below comment, to add a new line with the new SQL query, please make these additional changes

Add another entry to dictionary to indicate new line as below (careful to ensure the lines execute after all lines in a particular sheet are read) Edit the String formation so that once a NewLine is seen, that string is written to the output file. Do note that the NewFile boolean value will overwrite any file that is there. Multiple lines will be appended post that.

skip = True
for ws in wb.worksheets:
    if skip == True:
        skip = False
    else:
        for i in range(6, ws.max_row+1):  
            name = ws.cell(row=i, column=1).value
            print(i, name)
            name1=ws.cell(row=i, column=2).value
            print(name1)
            mylines["Column_name"].append(name) #Appending dictionary key "Column_name"
            mylines["Column_Type"].append(name1) #Appending dictionay key "Column_type"
            for index, value in enumerate(mylines["Column_Type"]):
                mylines["Column_Type"][index] = type_strs.get(value, value)
        mylines["Column_name"].append('NextLine')
        mylines["Column_Type"].append('NextLine')
theString = " "
NewFile = True
sheetList = wb.sheetnames
tabIndex = 1

for i in range(len(mylines['Column_name'])):
    if(mylines['Column_name'][i] != 'NextLine'):
        theString += mylines['Column_name'][i] + " " + mylines['Column_Type'][i]
        theString += ", "
    else:
        theString = theString[:-2]
        if NewFile:
            NewFile = False
            outputFile = open('output.txt', 'w')  # Text file Output
            print("New file  ", theString)
        else:
            outputFile = open('output.txt', 'a')
            print("Not new file  ", theString)
        outputFile.write("CREATE TABLE TRANSIENT TABLE STG_" + sheetList[tabIndex] +"({});".format(theString) + "
")
        outputFile.close()
        tabIndex += 1
        theString = " "

How to iterate thorugh multiple excel sheets using openpyxl library in Python?

Answers (1)

Related Questions