Reputation: 49
I am trying to e the BBC football results website to get teams, shots, goals, cards and incidents. I currently have 3 teams data passed into the URL.
I writing the script in Python and using the Beautiful soup bs4
package. When outputting the results to screen, the first team is printed, the the first and second team, then the first, second and third team. So the first team is effectively being printed 3 times, When I am trying to get the 3 teams just once.
Once I have this problem sorted I will write the results to file. I am adding the teams data into data frames then into a list (I am not sure if this is the best method).
I am sure if is something to do with the for
loops, but I am unsure how to resolve the problem.
Code:
from bs4 import BeautifulSoup
import urllib2
import pandas as pd
out_list = []
for numb in('EFBO839787', 'EFBO839786', 'EFBO815155'):
url = 'http://www.bbc.co.uk/sport/football/result/partial/' + numb + '?teamview=false'
teams_list = []
inner_page = urllib2.urlopen(url).read()
soupb = BeautifulSoup(inner_page, 'lxml')
for report in soupb.find_all('td', 'match-details'):
home_tag = report.find('span', class_='team-home')
home_team = home_tag and ''.join(home_tag.stripped_strings)
score_tag = report.find('span', class_='score')
score = score_tag and ''.join(score_tag.stripped_strings)
shots_tag = report.find('span', class_='shots-on-target')
shots = shots_tag and ''.join(shots_tag.stripped_strings)
away_tag = report.find('span', class_='team-away')
away_team = away_tag and ''.join(away_tag.stripped_strings)
df = pd.DataFrame({'away_team' : [away_team], 'home_team' : [home_team], 'score' : [score], })
out_list.append(df)
for shots in soupb.find_all('td', class_='shots'):
home_shots_tag = shots.find('span',class_='goal-count-home')
home_shots = home_shots_tag and ''.join(home_shots_tag.stripped_strings)
away_shots_tag = shots.find('span',class_='goal-count-away')
away_shots = away_shots_tag and ''.join(away_shots_tag.stripped_strings)
dfb = pd.DataFrame({'home_shots': [home_shots], 'away_shots' : [away_shots] })
out_list.append(dfb)
for incidents in soupb.find("table", class_="incidents-table").find("tbody").find_all("tr"):
home_inc_tag = incidents.find("td", class_="incident-player-home")
home_inc = home_inc_tag and ''.join(home_inc_tag.stripped_strings)
type_inc_goal_tag = incidents.find("td", "span", class_="incident-type goal")
type_inc_goal = type_inc_goal_tag and ''.join(type_inc_goal_tag.stripped_strings)
type_inc_tag = incidents.find("td", class_="incident-type")
type_inc = type_inc_tag and ''.join(type_inc_tag.stripped_strings)
time_inc_tag = incidents.find('td', class_='incident-time')
time_inc = time_inc_tag and ''.join(time_inc_tag.stripped_strings)
away_inc_tag = incidents.find('td', class_='incident-player-away')
away_inc = away_inc_tag and ''.join(away_inc_tag.stripped_strings)
df_incidents = pd.DataFrame({'home_player' : [home_inc],'event_type' : [type_inc_goal],'event_time': [time_inc],'away_player' : [away_inc]})
out_list.append(df_incidents)
print "end"
print out_list
I am new to python and stack overflow, any suggestions on formatting my questions is also useful.
Thanks in advance!
Upvotes: 0
Views: 101
Reputation: 12150
This looks like a printing problem, at what indentation level are you printing out_list ?
It should be at zero indentation, all the way to the left in your code.
Either that, or you want to move out_list into the top most for loop so that it's re-assigned after every iteration.
Upvotes: 0
Reputation: 1060
Those 3 for loops should be inside your main for loop.
out_list = []
for numb in('EFBO839787', 'EFBO839786', 'EFBO815155'):
url = 'http://www.bbc.co.uk/sport/football/result/partial/' + numb + '?teamview=false'
teams_list = []
inner_page = urllib.request.urlopen(url).read()
soupb = BeautifulSoup(inner_page, 'lxml')
for report in soupb.find_all('td', 'match-details'):
# your code as it is
for shots in soupb.find_all('td', class_='shots'):
# your code as it is
for incidents in soupb.find("table", class_="incidents-table").find("tbody").find_all("tr"):
# your code as it is
It works just fine - shows up a team just once.
Here's output of first for loop:
[{'score': ['1-3'], 'away_team': ['Man City'], 'home_team': ['Dynamo Kiev']},
{'score': ['1-0'], 'away_team': ['Zenit St P'], 'home_team': ['Benfica']},
{'score': ['1-2'], 'away_team': ['Boston United'], 'home_team': ['Bradford Park Avenue']}]
Upvotes: 1