Reputation: 399
The following code is not able to crate a temporary csv file in a StringIO object type. Is there a mistake somewhere in the code? The "data_temp" variable keeps churning out an empty object.
I am using the StringIO object in order to avoid creating another file on disk.
from bs4 import BeautifulSoup
from io import StringIO
import csv
import re
# Creates a new csv file to import data to MySQL
def create_csv_file():
source_html = open(r'C:\\Users\\Admin\\OneDrive\\eCommerce\\Servi-fied\\Raw Data\\EMA - Electricians (Raw).txt', 'r')
bs_object = BeautifulSoup(source_html, "html.parser")
data_temp = StringIO()
csv_file1 = open(r'C:\\Users\\Admin\\OneDrive\\eCommerce\\Servi-fied\\Raw Data\\EMA - Electricians (Processed).csv', 'w+')
writer1 = csv.writer(data_temp, delimiter='<', skipinitialspace=True)
table = bs_object.find("table", {"id":"gasOfferSearch"})
rows = table.findAll("tr")
# Debugging statement
print("There are " + (len(rows) - 1).__str__() + " rows.")
try:
# Iterates through t he list, but skips the first record (i.e. the table header)
counter = 0
for row in rows[1:]:
csvRow = []
for cell in row.findAll(['td','th']):
# Replace "\n" with a whitespace; replace <br> tags with 5 whitespaces
line = str(cell).replace('\n', ' ').replace('<br>', ' ')
# Replace 2 or more spaces with "\n"
line = re.sub('\s{2,}', '*', line)
# Converts results to a BeautifulSoup object
line_bsObj = BeautifulSoup(line, "html.parser")
# Strips: Removes all tags and trailing and leading whitespaces
# Replace: Removes all quotation marks
csvRow.append(line_bsObj.get_text().strip().replace('"',''))
# Converts the string into a csv file
writer1.writerow(csvRow)
print(data_temp.readlines())
counter += 1
# Debugging statement
print("There are " + counter.__str__() + " rows.")
print(data_temp.readlines())
# Reads from the temp file and replaces all "<*" with "<"
csv_file1.write(
data_temp.read().replace("<*", "<").replace("*\n", "").replace("*", "<", 1)
)
finally:
source_html.close()
csv_file1.close()
return None
# Execute the following functions
create_csv_file()
Upvotes: 1
Views: 6108
Reputation:
You're writing to the StringIO object, data_temp
, and then immediately attempt to read from it:
data_temp = StringIO()
writer1 = csv.writer(data_temp, delimiter='<', skipinitialspace=True)
...
writer1.writerow(csvRow)
print(data_temp.readlines())
At that moment (and ditto later), data_temp
's "file" pointer is at the end of the stream. So you're attempting to read past the end of the current file, resulting in no data.
If you want to do things this way, seek
to the start of date_temp
first, before reading:
data_temp.seek(0)
result = data_temp.read()
(But, without thoroughly diving into your code, I'd hazard a guess that there's another way to do what you're doing, without writing and reading into and from a temporary object.)
Upvotes: 6