Reputation: 87
I've recently tried to scrape a http://quotes.toscrape.com/ quotes (only on a first page) and save them into a csv file. I got a pretty weird result. Only commas were used as separators. See screenshot and code below:
from bs4 import BeautifulSoup
from urllib.request import urlopen
import csv
csvfile = open('quotes.csv', 'w')
writer = csv.writer(csvfile)
writer.writerow(('text'))
def parse():
html = urlopen('http://quotes.toscrape.com/page/1/')
bs = BeautifulSoup(html, 'lxml')
quotes = bs.findAll('div', class_='quote')
for quote in quotes:
try:
text = quote.find('span', class_='text').getText(
).replace(',', '|').replace('"', '')
print(text)
writer.writerow((text))
except UnicodeEncodeError:
break
parse()
csvfile.close()
Upvotes: 0
Views: 119
Reputation: 4682
You've attempted to use write rows with a tuple, however (weird quirk) you're not actually using a tuple.
See my example:
some_num = (1)
some_tuple = (1,)
Change this line:
writer.writerow((text))
to
writer.writerow((text,))
Note the comma :)
But why did that happen?
Rather than breaking it iterated through the string as if it was a tuple of single chars, e.g.
>>> for character in "this string":
... print(character)
t
h
i
s
s
t
r
i
n
g
Upvotes: 1