ak120908
ak120908

Reputation: 21

How to loop through a list of urls in python for web scraping

Very new to python and struggling with this loop. I'm trying to pull the html attribute data address from a list of static pages that i already have in list format. I've managed to use BS4 to pull the data from one page but I cannot get the loop correct to iterate through my list of URLs. Right now I am receiving this error (Invalid URL '0': No schema supplied. Perhaps you meant http://0?) but I checked the URLs in single pulls and they all work. Here is my working single pull code:

import requests
from bs4 import BeautifulSoup

result = requests.get('https://www.coingecko.com/en/coins/0xcharts')
src = result.content
soup = BeautifulSoup(src, 'lxml')

contract_address = soup.find(
    'i', attrs={'data-title': 'Click to copy'})

print(contract_address.attrs['data-address'])

This is the loop I am working on:

import requests
from bs4 import BeautifulSoup

url_list = ['https://www.coingecko.com/en/coins/2goshi','https://www.coingecko.com/en/coins/0xcharts']

for link in range(len(url_list)):
    result = requests.get(link)
    src = result.content
    soup = BeautifulSoup(src, 'lxml')

    contract_address = soup.find(
    'i', attrs={'data-title': 'Click to copy'})

    print(contract_address.attrs['data-address'])

url_list.seek(0)

Upvotes: 2

Views: 4389

Answers (2)

MendelG
MendelG

Reputation: 20008

You have misunderstood the usage of range(). Please read the docs.

When you do:

result = requests.get(link)

link is a an int value coming from range(), see what happens when you print(link). Instead, access the list url_list as follows:

result = requests.get(url_list[link])

Here's a full example:

import requests
from bs4 import BeautifulSoup

url_list = ['https://www.coingecko.com/en/coins/2goshi','https://www.coingecko.com/en/coins/0xcharts']

for link in range(len(url_list)):


    result = requests.get(url_list[link])
    src = result.content
    soup = BeautifulSoup(src, 'lxml')

    contract_address = soup.find(
    'i', attrs={'data-title': 'Click to copy'})

    print(contract_address.attrs['data-address'])

Output:

0x70e132641d6f1bd787b119a289fee544fbb2f316
0x86dd49963fe91f0e5bc95d171ff27ea996c0890c

Upvotes: 1

Bahae El Hmimdi
Bahae El Hmimdi

Reputation: 368

Try that.

import requests
from bs4 import BeautifulSoup

url_list = ['https://www.coingecko.com/en/coins/2goshi','https://www.coingecko.com/en/coins/0xcharts']

for link in url_list:
    result = requests.get(link)
    src = result.content
    soup = BeautifulSoup(src, 'lxml')

    contract_address = soup.find(
    'i', attrs={'data-title': 'Click to copy'})

    print(contract_address.attrs['data-address'])

url_list.seek(0)

Upvotes: 1

Related Questions