paulg
paulg

Reputation: 49

Looping through scraped data and outputting the result

I am trying to e the BBC football results website to get teams, shots, goals, cards and incidents. I currently have 3 teams data passed into the URL.

I writing the script in Python and using the Beautiful soup bs4 package. When outputting the results to screen, the first team is printed, the the first and second team, then the first, second and third team. So the first team is effectively being printed 3 times, When I am trying to get the 3 teams just once.

Once I have this problem sorted I will write the results to file. I am adding the teams data into data frames then into a list (I am not sure if this is the best method). I am sure if is something to do with the for loops, but I am unsure how to resolve the problem. Code:

from bs4 import BeautifulSoup
import urllib2
import pandas as pd


out_list = []
for numb in('EFBO839787', 'EFBO839786', 'EFBO815155'):

url = 'http://www.bbc.co.uk/sport/football/result/partial/' + numb + '?teamview=false'
teams_list = []
inner_page = urllib2.urlopen(url).read()
soupb = BeautifulSoup(inner_page, 'lxml')

for report in soupb.find_all('td', 'match-details'):
            home_tag = report.find('span', class_='team-home')
            home_team = home_tag and ''.join(home_tag.stripped_strings)

            score_tag = report.find('span', class_='score')
            score = score_tag and ''.join(score_tag.stripped_strings)

            shots_tag = report.find('span', class_='shots-on-target')
            shots = shots_tag and ''.join(shots_tag.stripped_strings)

            away_tag = report.find('span', class_='team-away')
            away_team = away_tag and ''.join(away_tag.stripped_strings)

            df = pd.DataFrame({'away_team' : [away_team], 'home_team' : [home_team], 'score' : [score],  })
            out_list.append(df)

for shots in soupb.find_all('td', class_='shots'):

              home_shots_tag = shots.find('span',class_='goal-count-home')
              home_shots = home_shots_tag and ''.join(home_shots_tag.stripped_strings)

              away_shots_tag = shots.find('span',class_='goal-count-away')
              away_shots = away_shots_tag and ''.join(away_shots_tag.stripped_strings)

              dfb = pd.DataFrame({'home_shots': [home_shots], 'away_shots' : [away_shots] })
              out_list.append(dfb)

for incidents in soupb.find("table", class_="incidents-table").find("tbody").find_all("tr"):

                   home_inc_tag = incidents.find("td", class_="incident-player-home")
                   home_inc = home_inc_tag and ''.join(home_inc_tag.stripped_strings)

                   type_inc_goal_tag = incidents.find("td", "span", class_="incident-type goal")
                   type_inc_goal = type_inc_goal_tag and ''.join(type_inc_goal_tag.stripped_strings)

                   type_inc_tag = incidents.find("td", class_="incident-type")
                   type_inc = type_inc_tag and ''.join(type_inc_tag.stripped_strings)

                   time_inc_tag = incidents.find('td', class_='incident-time')
                   time_inc = time_inc_tag and ''.join(time_inc_tag.stripped_strings)

                   away_inc_tag = incidents.find('td', class_='incident-player-away')
                   away_inc = away_inc_tag and ''.join(away_inc_tag.stripped_strings)

                   df_incidents = pd.DataFrame({'home_player' : [home_inc],'event_type' : [type_inc_goal],'event_time': [time_inc],'away_player' : [away_inc]})

                   out_list.append(df_incidents)


print "end"

print out_list

I am new to python and stack overflow, any suggestions on formatting my questions is also useful.

Thanks in advance!

Upvotes: 0

Views: 101

Answers (2)

ffledgling
ffledgling

Reputation: 12150

This looks like a printing problem, at what indentation level are you printing out_list ?

It should be at zero indentation, all the way to the left in your code.

Either that, or you want to move out_list into the top most for loop so that it's re-assigned after every iteration.

Upvotes: 0

Bidhan Bhattarai
Bidhan Bhattarai

Reputation: 1060

Those 3 for loops should be inside your main for loop.

out_list = []
for numb in('EFBO839787', 'EFBO839786', 'EFBO815155'):
  url = 'http://www.bbc.co.uk/sport/football/result/partial/' + numb + '?teamview=false'
  teams_list = []
  inner_page = urllib.request.urlopen(url).read()
  soupb = BeautifulSoup(inner_page, 'lxml')

  for report in soupb.find_all('td', 'match-details'):
              # your code as it is

  for shots in soupb.find_all('td', class_='shots'):
              # your code as it is

  for incidents in soupb.find("table", class_="incidents-table").find("tbody").find_all("tr"):
              # your code as it is

It works just fine - shows up a team just once.

Here's output of first for loop:

[{'score': ['1-3'], 'away_team': ['Man City'], 'home_team': ['Dynamo Kiev']}, 
{'score': ['1-0'], 'away_team': ['Zenit St P'], 'home_team': ['Benfica']}, 
{'score': ['1-2'], 'away_team': ['Boston United'], 'home_team': ['Bradford Park Avenue']}]

Upvotes: 1

Related Questions