Gilda Romano
Gilda Romano

Reputation: 21

Use URLs scraped from a webpage with BeautifulSoup

As per title, I have scraped the webpage I'm interested in and saved the URLs in a variable.

import requests
from bs4 import BeautifulSoup

for pagenumber in range(1, 2):
    url = 'https://www.congress.gov/search?q=%7B%22source%22%3A%22legislation%22%2C%22congress%22%3A%22112%22%7D&page={}'.format(pagenumber)
    res = requests.get(url, headers = {'User-agent': 'Chrome'})

soup = BeautifulSoup(res.text, 'html.parser')
lists = soup.find_all("li", {"class" : "expanded"})

for bill in lists:
    block = bill.find("span", {"class":"result-item"})
    link_cosponsors = block.find_all("a")[1]['href'] # I am interested in the second URL

The last line is giving me the list of URLs. Now I am struggling to access each of these URLs and scrape new information from each of them.

for url in link_cosponsors:

    soup_cosponsor = BeautifulSoup(requests.get(url).text, 'html.parser')
    table = soup.find('table', {'class':'item_table'})

I think the issue is with the way link_cosponsors is created i.e. the first element of the list isn't the full 'https://etc.' but only 'h', because I get the error "Invalid URL 'h': No schema supplied. Perhaps you meant http://h?". I have tried appending the links to a list but that isn't working either.

Upvotes: 2

Views: 56

Answers (1)

Valdir Stumm Junior
Valdir Stumm Junior

Reputation: 4667

The problem is that you're reassigning link_cosponsors at each iteration in the for loop. This way, this variable will hold only the last link you've found as a string.

What happens then is that your for url in link_cosponsors iterates over that string, letter by letter. Basically like this:

for letter in 'http://the.link.you.want/foo/bar':
    print(letter)

Solution: you should replace the last 3 lines of the first snippet by:

link_cosponsors = []
for bill in lists:
    block = bill.find("span", {"class":"result-item"})
    link_cosponsors.append(block.find_all("a")[1]['href'])

Upvotes: 2

Related Questions