Thành Đạt
Thành Đạt

Reputation: 327

Convert string of href into list of links

I am trying to scrape some statistics including the match results and team names of those matches from Gosugamers using this code:

from bs4 import BeautifulSoup
import requests

for i in range(411):
    try:
        i += 1
        print(i)
        url = 'http://www.gosugamers.net/counterstrike/gosubet?r-page={}'.format(i)
        r = requests.get(url)
        web = BeautifulSoup(r.content,"html.parser")
        table = web.findAll("table", attrs={"class":"simple matches"})
        table = table[1]
        links = table('a')
        for link in links:
            if 'matches' in link.get('href', None):
                if len(link.get('href', None)) != 0:
                    print(link.get('href', None))

    except:
        pass

But when I got the link.get('href', None) which is a string containing all the links for matches on a single page, I do not know how to turn it into a list of all the links, would be glad if someone could help me out, thanks!

Upvotes: 2

Views: 2061

Answers (1)

Bemmu
Bemmu

Reputation: 18247

To me it seems that link.get('href', None) actually returns a single link. The get method documentation says:

get(self, key, default=None) method of bs4.element.Tag instance

Returns the value of the 'key' attribute for the tag, or
the value given for 'default' if it doesn't have that
attribute.

So when you get a link which has 'matches' in it, you could just add it to a list.

from bs4 import BeautifulSoup
import requests

all_links = []

i = 1
for i in range(411):
    try:
        print(i)
        url = 'http://www.gosugamers.net/counterstrike/gosubet?r-page={}'.format(i)
        r = requests.get(url)
        web = BeautifulSoup(r.content,"html.parser")
        table = web.findAll("table", attrs={"class":"simple matches"})
        table = table[1]
        links = table('a')

        for link in links:
            href = link.get('href')
            if href is not None and 'matches' in href:
                all_links.append(href)

        i += 1
    except:
        pass

print "Here are all the links: ", all_links

Upvotes: 2

Related Questions