Reputation: 601
I am developing an ETL (extract, transform, load) tool to automate some office work. There's an individual data set for each employee; scraping a website (with the employee ID numebr as an input variable) gives me a data set.
The main function looks like this:
for emp_number in all_emp_numbers:
#scrape html from website:
web_element_scraper.main_function(str(emp_number))
time.sleep(5)
# take scraped html from one html file, clean it, and place the cleaned html in
# another file:
web_element_cleaner.main_function('results_cache.html','detagged_markup.html')
time.sleep(5)
# convert html table into a set of nested lists:
table_values = table_converter.desired_table
time.sleep(3)
# upload nested lists (i.e., a "list of lists" into MS SQL Server 2008 database:
db_upload.main_function(str(emp_number))
time.sleep(5)
print("ETL process completed.")
All pertinent modules have been imported, etc.
Here's where my problem begins: if I run this script for any single employee number, the script executes fine, and the values appear in the MS SQL Server db table as intended.
However, if I have multiple employee numbers and the script has to run multiple times, the first iteration (seemingly) works as intended- the values are successfully uploaded to the table- but the very next iteration, regardless of which employee number it is, does not!
Based on some debugging and testing, I've found that the problem is constrained to the SQL Server connection- i.e., for subsequent iterations, I get this error:
pymssql.InterfaceError: Connection is closed.
I don't really know how to fix this issue, especially since the connection had been working just fine for the very first iteration.
EDITED TO ADD: Here's the db_upload.py module:
import pymssql
import credentials_page
import table_converter
import time
db_connect = pymssql.connect(
server = credentials_page.database_server,
user = credentials_page.database_username,
password = credentials_page.database_password,
database = credentials_page.database_name
)
def main_function(emp_id):
my_cursor = db_connect.cursor()
data_table = table_converter.desired_table
# adjust date format:
for data_row in data_table:
data_row[0] = data_row[0][:26]
data_row.append(emp_id)
for individual_line in data_table:
my_cursor.execute("INSERT INTO [db_table_name] VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s)", tuple(individual_line))
db_connect.commit()
time.sleep(3)
db_connect.close()
time.sleep(2)
print("Data uploaded.")
Upvotes: 0
Views: 3243
Reputation: 990
The reason you cannot submit more than one record is because in your function you are calling the .close() function, that closes the connection and would require you to run connect inside the function, the way you have it structured right now it is only called once, when you import the function.
See the relevant docs in pymssql here
Upvotes: 3