Reputation: 15
Stuck on something and hoping to get some ideas on what I'm doing wrong here. I have written out a web scraping program that scrapes all web links from the census.gov website but when I try to write my results out to a csv file, I am only getting one of the links to write out instead of the full list. See code below. To ensure my set was pulling correctly, I added in a line of code that first printed the results of my set and then goes into writing out the results in csv. The initial results look correct, as I'm seeing a full list of links. However, I am not sure why I am only able to write out one row of data to excel csv:
import requests
from bs4 import BeautifulSoup, SoupStrainer
import bs4, csv
search_link = "https://www.census.gov/programs-surveys/popest.html"
search = requests.get(search_link).text
raw_html = search
soup = BeautifulSoup(raw_html, 'html.parser')
import re
links = soup.find_all('a', {'class': re.compile('uscb*')})
urls_set = set()
for link in links:
my_links = link.get("href")
if my_links not in urls_set:
urls_set.add(my_links)
print(my_links)
with open("Current Estimate Result.csv",'wb') as f:
cw = csv.writer(f)
cw.writerows(my_links)
print(my_links)
f.close()
Upvotes: 0
Views: 55
Reputation: 23024
The issue is that the my_links
variable holds the last read URL. So cw.writerows(my_links)
writes out just that URL and not all of the URLs - which are actually stored in urls_set
.
However, I'm not sure your usage of the writerows()
method is entirely correct. This method expects an iterator of row objects (typically a list of lists). Each nested list represents a row in the CSV file.
So it may be better to hold the URLs in a list rather than a set, and then wrap each URL in it's own list (row) before adding. For example:
urls_list = []
for link in links:
my_link = [link.get("href")] # A row in the csv
if my_link not in urls_list:
urls_list.append(my_link)
...
cw.writerows(urls_list) # Pass the overall list
Note I renamed my_links
to my_link
in the example above. Using a list would also ensure that the order would be preserved.
Upvotes: 1