Reputation: 103
I am trying to obtain the number of team members from each team on the list, right now I get all the team links, but instead of obtain all the links I want to obtain the links to the teams with at least 5 team members. How would I go about doing so? I tried but nothing worked so far.
import time
import requests
from bs4 import BeautifulSoup
def get_all(url, base):
r = requests.get(url)
page = r.text
soup = BeautifulSoup(page, 'html.parser')
for team_links in soup.select('div.details h3 a'):
yield base + team_links['href']
next_page = soup.find('div', {'class': 'pages'}).find('span', text='Next')
while next_page:
# Gives the server a break
time.sleep(0.2)
r = requests.get(BASE_URL + next_page.find_previous('a')['href'])
page = r.text
soup = BeautifulSoup(page)
for team_links in soup.select('div.details h3 a'):
yield BASE_URL + team_links['href']
next_page = soup.find('div', {'class': 'pages'}).find('span', text='Next')
if __name__ == '__main__':
BASE_URL = 'http://www.gosugamers.net'
URL = 'http://www.gosugamers.net/counterstrike/teams'
for link in get_all(URL, BASE_URL):
print (link)
Upvotes: 1
Views: 59
Reputation: 473833
Locate the Members:
label which goes further in the tree after the team link. Then, get the team members value, convert to integer and check if it is less than 5:
for team_links in soup.select('div.details h3 a'):
members = int(team_links.find_next("th", text="Members:").find_next_sibling("td").text.strip())
if members < 5: # skip teams with less than 5 members
continue
yield base + team_links['href']
Note that this would fail in case there is a 1 (Pending: 1)
instead of an integer value. Depending on whether you want to count the pending team members or not, there could be a different logic handling that.
For instance, if you don't want to count pending team members, we can just split by space and get the first item, ignoring what is inside "pending":
for team_links in soup.select('div.details h3 a'):
members = int(team_links.find_next("th", text="Members:").find_next_sibling("td").text.strip().split()[0])
# ...
Upvotes: 1