Reputation: 254
I have some code that parses HTML with BeautifulSoup, and prints the code. Here is the source code (gist linked if interested):
import csv
import requests
from bs4 import BeautifulSoup
import lxml
r = requests.post('https://opir.fiu.edu/instructor_evals/instr_eval_result.asp', data={'Term': '1175', 'Coll': 'CBADM'})
soup = BeautifulSoup(r.text, "lxml")
tables = soup.find_all('table')
print(tables)
print(tables)
The output of my code before exporting to CSV looks like this:
Question No Response Excellent Very Good
Good Fair Poor
Description of course objectives and assignments
0.0% 76.1% 17.4% 6.5% 0.0%
0.0%
Communication of ideas and information 0.0%
78.3% 17.4% 4.3% 0.0% 0.0%
I really liked this output, and wanted to export it to a CSV, so I then added the following:
writer = csv.writer(open("C:\\Temp\\output_file.csv", 'w'))
for table in tables:
rows = table.find_all("tr")
for row in rows:
cells = row.find_all("td")
if len(cells) == 7: # this filters out rows with 'Term', 'Instructor Name' etc.
for cell in cells:
print(cell.text + "\t", end="")
writer.writerow(cell.text)
print("") # newline after each row
print("-------------") # table delimiter
Unfortunately, this code results in each single unique character or letter having its own cell:
So my question is this: how can I fix this code so that it properly exports the output to a CSV file, without adding a new cell for each and every character? I'm not exactly sure why it is doing this. It also seems to only be exporting the very first table, and ignoring every other piece of data in the code.
Upvotes: 1
Views: 1474
Reputation: 403128
cell.text
is a string, but writerow
needs an iterable of data, so it can write each element to its own cell. Since you passed a list, each character is treated as a separate element and written to separate cells.
You'll have to wrap a []
around the string to get it working, so you're passing a list of a string:
writer.writerow([cell.text])
Upvotes: 2