user3516855
user3516855

Reputation: 15

Trying to write results of a set into a csv file in Python but only getting one line to print

Stuck on something and hoping to get some ideas on what I'm doing wrong here. I have written out a web scraping program that scrapes all web links from the census.gov website but when I try to write my results out to a csv file, I am only getting one of the links to write out instead of the full list. See code below. To ensure my set was pulling correctly, I added in a line of code that first printed the results of my set and then goes into writing out the results in csv. The initial results look correct, as I'm seeing a full list of links. However, I am not sure why I am only able to write out one row of data to excel csv:

import requests
from bs4 import BeautifulSoup, SoupStrainer
import bs4, csv
search_link = "https://www.census.gov/programs-surveys/popest.html"
search = requests.get(search_link).text
raw_html = search
soup = BeautifulSoup(raw_html, 'html.parser')
import re
links = soup.find_all('a', {'class': re.compile('uscb*')})
urls_set = set()
for link in links:
    my_links = link.get("href")
    if my_links not in urls_set:
        urls_set.add(my_links)
        print(my_links)
with open("Current Estimate Result.csv",'wb') as f:
         cw = csv.writer(f)
         cw.writerows(my_links)
         print(my_links)        
         f.close()

Upvotes: 0

Views: 55

Answers (1)

Will Keeling
Will Keeling

Reputation: 23024

The issue is that the my_links variable holds the last read URL. So cw.writerows(my_links) writes out just that URL and not all of the URLs - which are actually stored in urls_set.

However, I'm not sure your usage of the writerows() method is entirely correct. This method expects an iterator of row objects (typically a list of lists). Each nested list represents a row in the CSV file.

So it may be better to hold the URLs in a list rather than a set, and then wrap each URL in it's own list (row) before adding. For example:

urls_list = []
for link in links:
    my_link = [link.get("href")]  # A row in the csv
    if my_link not in urls_list:
        urls_list.append(my_link)

    ...
    cw.writerows(urls_list)  # Pass the overall list

Note I renamed my_links to my_link in the example above. Using a list would also ensure that the order would be preserved.

Upvotes: 1

Related Questions