Reputation: 300
I am using Openpyxl
library to read xlsx
file and extract few contents and add more strings to my txt output file.The excel file I am currently using contain sheets with name Summary and Employee. My below code is working fine for my current excel file. Now the issue is I would to use the same code for reading another excel file containing more sheets whose sheetnames I am not sure of. So in my code line ws = wb['Employee']
. The sheetname will change all the time. However, One thing I am sure about is I don't want to read any data from sheet1. All the data extraction will occur from sheet2 onwards in all the xlsx
files. I am not sure how to proceed from here so any help will be appreciated.
Thanks in advance for your time and efforts!
Code:
from openpyxl import load_workbook
data_file='\\test.xlsx'
# Load the entire workbook.
wb = load_workbook(data_file)
ws = wb['Employee'] #Manually adding sheet name here
mylines={"Column_name":[],"Column_Type":[]} #Getting 2 columns data from row 6
type_strs = {
'String': 'VARCHAR(256)',
'Numeric': 'NUMBER',
'Date': 'NUMBER(4,0)',
'Int': 'NUMBER'
}
for index, value in enumerate(mylines["Column_Type"]):
mylines["Column_Type"][index] = type_strs.get(value, value)
for i in range(6, ws.max_row+1):
name = ws.cell(row=i, column=1).value
name1=ws.cell(row=i, column=2).value
mylines["Column_name"].append(name) #Appending dictionary key "Column_name"
mylines["Column_Type"].append(name1) #Appending dictionay key "Column_type"
for index, value in enumerate(mylines["Column_Type"]):
mylines["Column_Type"][index] = type_strs.get(value, value)
theString = " "
for i in range(len(mylines['Column_name'])):
theString += mylines['Column_name'][i] + " " + mylines['Column_Type'][i]
if i < len(mylines['Column_name'])-1:
theString += ", "
outputFile = open('/output.txt', 'w') # Text file Output
outputFile.write("CREATE TABLE TRANSIENT TABLE STG_EMPLOYEE({});".format(theString) + "\n")
outputFile.close() #Closing file
Updated Code based on SO User comment:
from openpyxl import load_workbook
data_file='\\test.xlsx'
# Load the entire workbook.
wb = load_workbook(data_file)
#ws = wb['Employee'] #Manually adding sheet name here
mylines={"Column_name":[],"Column_Type":[]} #Getting 2 columns data from row 6
type_strs = {
'String': 'VARCHAR(256)',
'Numeric': 'NUMBER',
'Date': 'NUMBER(4,0)',
'Int': 'NUMBER'
}
for index, value in enumerate(mylines["Column_Type"]):
mylines["Column_Type"][index] = type_strs.get(value, value)
skip = True
for ws in wb.worksheets:
if skip == True:
skip = False
else:
for i in range(6, ws.max_row+1):
name = ws.cell(row=i, column=1).value
name1=ws.cell(row=i, column=2).value
mylines["Column_name"].append(name) #Appending dictionary key "Column_name"
mylines["Column_Type"].append(name1) #Appending dictionay key "Column_type"
for index, value in enumerate(mylines["Column_Type"]):
mylines["Column_Type"][index] = type_strs.get(value, value)
theString = " "
for i in range(len(mylines['Column_name'])):
theString += mylines['Column_name'][i] + " " + mylines['Column_Type'][i]
if i < len(mylines['Column_name'])-1:
theString += ", "
outputFile = open('/output.txt', 'w') # Text file Output
outputFile.write("CREATE TABLE TRANSIENT TABLE STG_EMPLOYEE({});".format(theString) + "\n")
outputFile.close() #Closing file
Excel data
<Sheet 1 Name -> Summary Sheet: Empty
<Sheet 2 Name -> Employee Sheet
File Name: Employee
Sheet Name: Employee
File Type: csv
Field Name Type
Name String
Salary Numeric
Date Date
Phone Int
<Sheet 3 Name-> Employee1 Sheet
File Name: Employee
Sheet Name: Employee1
File Type: csv
Field Name Type
Employee Name Date
Employee Salary Int
Employment Date Int
Office Phone Int
Upvotes: 0
Views: 597
Reputation: 9967
To iterate through all worksheets in a workbook and read data in them (except the first worksheet, remove the ws = wb['Employee']
Use a for loop (insert before for i in range(5,...
as this
skip = True
for ws in wb.worksheets:
if skip == True:
skip = False
else:
for i in range(6, ws.max_row+1):
name = ws.cell(row=i, column=1).value
....
This will read each sheet and append data to mylines
, except the first sheet
Second Update As you mentioned in below comment, to add a new line with the new SQL query, please make these additional changes
skip = True
for ws in wb.worksheets:
if skip == True:
skip = False
else:
for i in range(6, ws.max_row+1):
name = ws.cell(row=i, column=1).value
print(i, name)
name1=ws.cell(row=i, column=2).value
print(name1)
mylines["Column_name"].append(name) #Appending dictionary key "Column_name"
mylines["Column_Type"].append(name1) #Appending dictionay key "Column_type"
for index, value in enumerate(mylines["Column_Type"]):
mylines["Column_Type"][index] = type_strs.get(value, value)
mylines["Column_name"].append('NextLine')
mylines["Column_Type"].append('NextLine')
theString = " "
NewFile = True
sheetList = wb.sheetnames
tabIndex = 1
for i in range(len(mylines['Column_name'])):
if(mylines['Column_name'][i] != 'NextLine'):
theString += mylines['Column_name'][i] + " " + mylines['Column_Type'][i]
theString += ", "
else:
theString = theString[:-2]
if NewFile:
NewFile = False
outputFile = open('output.txt', 'w') # Text file Output
print("New file ", theString)
else:
outputFile = open('output.txt', 'a')
print("Not new file ", theString)
outputFile.write("CREATE TABLE TRANSIENT TABLE STG_" + sheetList[tabIndex] +"({});".format(theString) + "\n")
outputFile.close()
tabIndex += 1
theString = " "
Upvotes: 1