Reputation: 157
I am trying to create a CSV file with a list of URLs.
I am pretty new to programming, so please excuse any sloppy code.
I have a loop that runs through a list of places to get the list of URLs.
I then have a loop within that loop that exports the data to a CSV file.
import urllib, csv, re
from BeautifulSoup import BeautifulSoup
list_of_URLs = csv.reader(open("file_location_for_URLs_to_parse"))
for row in list_of_URLs:
row_string = "".join(row)
file = urllib.urlopen(row_string)
page_HTML = file.read()
soup = BeautifulSoup(page_HTML) # parsing HTML
Thumbnail_image = soup.findAll("div", {"class": "remositorythumbnail"})
Thumbnail_image_string = str(Thumbnail_image)
soup_3 = BeautifulSoup(Thumbnail_image_string)
Thumbnail_image_URL = soup_3.findAll('a', attrs={'href': re.compile("^http://")})
This is the part that isn't working for me:
out = csv.writer(open("file_location", "wb"), delimiter=";")
for tag in soup_3.findAll('a', href=True):
out.writerow(tag['href'])
Basically the writer keeps on writing over itself, is there a way to jump to below the first empty row on the CSV and start writing?
Upvotes: 1
Views: 2016
Reputation: 879083
Don't put this inside any loop:
out = csv.writer(open("file_location", "wb"), delimiter=";")
Instead:
with open("file_location", "wb") as fout:
out = csv.writer(fout, delimiter=";")
# put for-loop here
Notes:
open("file_location", "wb")
creates a new file, destroying any old file of the same name. This is why it
looks like the writer is overwriting old lines.with open(...) as ...
because it automatically closes the file
for you when the with-block
ends. This makes explicit when the file is closed. Otherwise, the file remains open (and maybe not completely flushed) until out
is deleted or reassigned to a new value. It's not really your main problem here, but using with
is too useful not to mention. Upvotes: 5
Reputation: 19317
The open("file_location", "wb")
call, which you are doing once for every URL, is wiping out what you did to that file previously. Move it outside your for
loop so that it is only opened once for all the URLs.
Upvotes: 0
Reputation: 20229
Are you closing the file after every write, or opening the file before every write? Just check that.
Also, try using "ab" mode instead of "wb". "ab" will append to the file.
Upvotes: 1