Reputation: 327
I am trying to scrape some statistics including the match results and team names of those matches from Gosugamers using this code:
from bs4 import BeautifulSoup
import requests
for i in range(411):
try:
i += 1
print(i)
url = 'http://www.gosugamers.net/counterstrike/gosubet?r-page={}'.format(i)
r = requests.get(url)
web = BeautifulSoup(r.content,"html.parser")
table = web.findAll("table", attrs={"class":"simple matches"})
table = table[1]
links = table('a')
for link in links:
if 'matches' in link.get('href', None):
if len(link.get('href', None)) != 0:
print(link.get('href', None))
except:
pass
But when I got the link.get('href', None)
which is a string containing all the links for matches on a single page, I do not know how to turn it into a list of all the links, would be glad if someone could help me out, thanks!
Upvotes: 2
Views: 2061
Reputation: 18247
To me it seems that link.get('href', None)
actually returns a single link. The get method documentation says:
get(self, key, default=None) method of bs4.element.Tag instance
Returns the value of the 'key' attribute for the tag, or
the value given for 'default' if it doesn't have that
attribute.
So when you get a link which has 'matches' in it, you could just add it to a list.
from bs4 import BeautifulSoup
import requests
all_links = []
i = 1
for i in range(411):
try:
print(i)
url = 'http://www.gosugamers.net/counterstrike/gosubet?r-page={}'.format(i)
r = requests.get(url)
web = BeautifulSoup(r.content,"html.parser")
table = web.findAll("table", attrs={"class":"simple matches"})
table = table[1]
links = table('a')
for link in links:
href = link.get('href')
if href is not None and 'matches' in href:
all_links.append(href)
i += 1
except:
pass
print "Here are all the links: ", all_links
Upvotes: 2