Reputation:
I am trying to output the scrapped data from a website into a csv file, first I was coming across UnicodeEncoding error but after using this piece of code:
if __name__ == "__main__":
reload(sys)
sys.setdefaultencoding("utf-8")
I am able to generate the csv, below is the code for the same:
import csv
import urllib2
import sys
from bs4 import BeautifulSoup
if __name__ == "__main__":
reload(sys)
sys.setdefaultencoding("utf-8")
page = urllib2.urlopen('http://www.att.com/shop/wireless/devices/smartphones.html').read()
soup = BeautifulSoup(page)
soup.prettify()
for anchor in soup.findAll('a', {"class": "clickStreamSingleItem"}):
print anchor['title']
with open('Smartphones.csv', 'wb') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=',')
spamwriter.writerow([(anchor['title'])])
But I am getting only one device name in the output csv, I don't have any programming background, pardon me for the ignorance. Can you please help me pinpoint the issue in this?
Upvotes: 1
Views: 763
Reputation: 1121416
That's to be expected; you write the file from scratch each time you find an element. Open the file only once before looping over the links, then write rows for each anchor you find:
with open('Smartphones.csv', 'wb') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=',')
for anchor in soup.findAll('a', {"class": "clickStreamSingleItem"}):
print anchor['title']
spamwriter.writerow([anchor['title'].encode('utf8')])
Opening a file for writing with w
clears the file first, and you were doing that for each anchor.
As for your unicode error, please avoid, at all cost, changing the default encoding. Instead, encode your rows properly; I did so in the above example, you can remove the whole .setdefaultencoding()
call (and the reload()
before it).
Upvotes: 1