Reputation: 13
I've tried various ideas and I always come back to 2 main results that are wrong. I don't know where I'm going wrong.
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
import re
url = "https://www.skysports.com/premier-league-table"
url_1 = "https://www.espn.com/soccer/standings/_/league/eng.1"
uClient = uReq(url)
page_html = uClient.read()
page_soup = soup(page_html, "html.parser")
uClient_1 = uReq(url_1)
page_html_1 = uClient_1.read()
page_soup_1 = soup(page_html_1, "html.parser")
teams = page_soup.find_all(class_ = "standing-table__cell--name-link")
points = page_soup_1.find_all(class_ = "stat-cell")
points_text = points[7::8]
n = 0
for i in points_text:
n += 1
for team in teams:
points_text_string = str(i)
points_clean = re.findall(r'\d+', points_text_string)
result = "".join(points_clean)
print(str(n)+".", team.text, result)
Prints out:
The problem here is that it only prints out the last team's name over and over again.
1. Sheffield United 86
2. Sheffield United 74
3. Sheffield United 69
4. Sheffield United 67
5. Sheffield United 66
6. Sheffield United 65
7. Sheffield United 62
8. Sheffield United 61
9. Sheffield United 59
10. Sheffield United 59
11. Sheffield United 55
12. Sheffield United 45
13. Sheffield United 45
14. Sheffield United 44
15. Sheffield United 43
16. Sheffield United 41
17. Sheffield United 39
18. Sheffield United 28
19. Sheffield United 26
20. Sheffield United 23
If I push the print into the for team in teams loop
I get this:
I get all the team names, but every points and placing is repeated over i times
1. Manchester City 86
1. Manchester United 86
1. Liverpool 86
1. Chelsea 86
1. Leicester City 86
1. West Ham United 86
1. Tottenham Hotspur 86
1. Arsenal 86
1. Leeds United 86
1. Everton 86
1. Aston Villa 86
1. Newcastle United 86
1. Wolverhampton Wanderers 86
1. Crystal Palace 86
1. Southampton 86
1. Brighton and Hove Albion 86
1. Burnley 86
1. Fulham 86
1. West Bromwich Albion 86
1. Sheffield United 86
2. Manchester City 74
2. Manchester United 74
2. Liverpool 74
2. Chelsea 74
2. Leicester City 74
2. West Ham United 74
2. Tottenham Hotspur 74
2. Arsenal 74
2. Leeds United 74
2. Everton 74
2. Aston Villa 74
2. Newcastle United 74
2. Wolverhampton Wanderers 74
2. Crystal Palace 74
I should be getting:
1. Manchester City 86
2.Manchester United 74
3. Liverpool 69
...
20. Sheffield United 23
Upvotes: 0
Views: 62
Reputation: 46
The issue you are having is that the team.text in your loop was defined as "Sheffield United" since that is the last value in your table.
I rewrote the loop such that the print statement pull from a list of names that I created earlier in the program to hold the names, which are added through the team in teams loop.
uClient_1 = uReq(url_1)
page_html_1 = uClient_1.read()
page_soup_1 = soup(page_html_1, "html.parser")
teams = page_soup.find_all(class_ = "standing-table__cell--name-link")
points = page_soup_1.find_all(class_ = "stat-cell")
points_text = points[7::8]
team_names = [] # Creates the team name holding list referenced in print statement
n = 0
for i in points_text:
n += 1
for team in teams:
points_text_string = str(i)
points_clean = re.findall(r'\d+', points_text_string)
result = "".join(points_clean)
team_names.append(team.text) # adds the team name to the team names list
print(str(n)+".", team_names[n-1], result)
Here was the output:
Upvotes: 0
Reputation: 1438
Use zip
to iterate over multiple objects at once instead of nested loops. You will get a tuple of (point, team). Also, eliminate the loop counter variable n
by using enumerate
. This makes your code more pythonic. Check out the corrected code below:
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
import re
url = "https://www.skysports.com/premier-league-table"
url_1 = "https://www.espn.com/soccer/standings/_/league/eng.1"
uClient = uReq(url)
page_html = uClient.read()
page_soup = soup(page_html, "html.parser")
uClient_1 = uReq(url_1)
page_html_1 = uClient_1.read()
page_soup_1 = soup(page_html_1, "html.parser")
teams = page_soup.find_all(class_ = "standing-table__cell--name-link")
points = page_soup_1.find_all(class_ = "stat-cell")
points_text = points[7::8]
for i, (point, team) in enumerate(zip(points_text, teams)):
points_text_string = str(point)
points_clean = re.findall(r'\d+', points_text_string)
result = "".join(points_clean)
print(str(i+1)+".", team.text, result)
This gives the following result:
1. Manchester City 86
2. Manchester United 74
3. Liverpool 69
4. Chelsea 67
5. Leicester City 66
6. West Ham United 65
7. Tottenham Hotspur 62
8. Arsenal 61
9. Leeds United 59
10. Everton 59
11. Aston Villa 55
12. Newcastle United 45
13. Wolverhampton Wanderers 45
14. Crystal Palace 44
15. Southampton 43
16. Brighton and Hove Albion 41
17. Burnley 39
18. Fulham 28
19. West Bromwich Albion 26
20. Sheffield United 23
Upvotes: 1