Cloud
Cloud

Reputation: 399

Write csv output to StringIO object

The following code is not able to crate a temporary csv file in a StringIO object type. Is there a mistake somewhere in the code? The "data_temp" variable keeps churning out an empty object.

I am using the StringIO object in order to avoid creating another file on disk.

from bs4 import BeautifulSoup
from io import StringIO

import csv
import re


# Creates a new csv file to import data to MySQL
def create_csv_file():
    source_html = open(r'C:\\Users\\Admin\\OneDrive\\eCommerce\\Servi-fied\\Raw Data\\EMA - Electricians (Raw).txt', 'r')
    bs_object = BeautifulSoup(source_html, "html.parser")

    data_temp = StringIO()
    csv_file1 = open(r'C:\\Users\\Admin\\OneDrive\\eCommerce\\Servi-fied\\Raw Data\\EMA - Electricians (Processed).csv', 'w+')

    writer1 = csv.writer(data_temp, delimiter='<', skipinitialspace=True)

    table = bs_object.find("table", {"id":"gasOfferSearch"})
    rows = table.findAll("tr")
    # Debugging statement
    print("There are " + (len(rows) - 1).__str__() + " rows.")

    try:
        # Iterates through t   he list, but skips the first record (i.e. the table header)
        counter = 0
        for row in rows[1:]:
            csvRow = []
            for cell in row.findAll(['td','th']):
                # Replace "\n" with a whitespace; replace <br> tags with 5 whitespaces
                line = str(cell).replace('\n', ' ').replace('<br>', '     ')
                # Replace 2 or more spaces with "\n"
                line = re.sub('\s{2,}', '*', line)
                # Converts results to a BeautifulSoup object
                line_bsObj = BeautifulSoup(line, "html.parser")
                # Strips: Removes all tags and trailing and leading whitespaces
                # Replace: Removes all quotation marks
                csvRow.append(line_bsObj.get_text().strip().replace('"',''))

            # Converts the string into a csv file
            writer1.writerow(csvRow)
            print(data_temp.readlines())
            counter += 1

        # Debugging statement
        print("There are " + counter.__str__() + " rows.")
        print(data_temp.readlines())

        # Reads from the temp file and replaces all "<*" with "<"
        csv_file1.write(
            data_temp.read().replace("<*", "<").replace("*\n", "").replace("*", "<", 1)
        )

    finally:
        source_html.close()
        csv_file1.close()

    return None

# Execute the following functions
create_csv_file()

Upvotes: 1

Views: 6108

Answers (1)

user707650
user707650

Reputation:

You're writing to the StringIO object, data_temp, and then immediately attempt to read from it:

data_temp = StringIO()
writer1 = csv.writer(data_temp, delimiter='<', skipinitialspace=True)
...
writer1.writerow(csvRow)
print(data_temp.readlines())

At that moment (and ditto later), data_temp's "file" pointer is at the end of the stream. So you're attempting to read past the end of the current file, resulting in no data.

If you want to do things this way, seek to the start of date_temp first, before reading:

data_temp.seek(0)
result = data_temp.read()

(But, without thoroughly diving into your code, I'd hazard a guess that there's another way to do what you're doing, without writing and reading into and from a temporary object.)

Upvotes: 6

Related Questions