Mark Collier
Mark Collier

Reputation: 157

CSV Writer writing over itself

I am trying to create a CSV file with a list of URLs.

I am pretty new to programming, so please excuse any sloppy code.

I have a loop that runs through a list of places to get the list of URLs.

I then have a loop within that loop that exports the data to a CSV file.

import urllib, csv, re
from BeautifulSoup import BeautifulSoup
list_of_URLs = csv.reader(open("file_location_for_URLs_to_parse"))
for row in list_of_URLs:
    row_string = "".join(row)
    file = urllib.urlopen(row_string)
    page_HTML = file.read()
    soup = BeautifulSoup(page_HTML) # parsing HTML
    Thumbnail_image = soup.findAll("div", {"class": "remositorythumbnail"})
    Thumbnail_image_string = str(Thumbnail_image)
    soup_3 = BeautifulSoup(Thumbnail_image_string)
    Thumbnail_image_URL = soup_3.findAll('a', attrs={'href': re.compile("^http://")})

This is the part that isn't working for me:

    out  = csv.writer(open("file_location", "wb"), delimiter=";")
    for tag in soup_3.findAll('a', href=True):   
        out.writerow(tag['href'])

Basically the writer keeps on writing over itself, is there a way to jump to below the first empty row on the CSV and start writing?

Upvotes: 1

Views: 2016

Answers (3)

unutbu
unutbu

Reputation: 879083

Don't put this inside any loop:

out  = csv.writer(open("file_location", "wb"), delimiter=";")

Instead:

with open("file_location", "wb") as fout:
    out = csv.writer(fout, delimiter=";")
    # put for-loop here

Notes:

  1. open("file_location", "wb") creates a new file, destroying any old file of the same name. This is why it looks like the writer is overwriting old lines.
  2. Use with open(...) as ... because it automatically closes the file for you when the with-block ends. This makes explicit when the file is closed. Otherwise, the file remains open (and maybe not completely flushed) until out is deleted or reassigned to a new value. It's not really your main problem here, but using with is too useful not to mention.

Upvotes: 5

wberry
wberry

Reputation: 19317

The open("file_location", "wb") call, which you are doing once for every URL, is wiping out what you did to that file previously. Move it outside your for loop so that it is only opened once for all the URLs.

Upvotes: 0

varunl
varunl

Reputation: 20229

Are you closing the file after every write, or opening the file before every write? Just check that.
Also, try using "ab" mode instead of "wb". "ab" will append to the file.

Upvotes: 1

Related Questions