Reputation: 59
I am trying to scrape football matches from https://www.skysports.com/premier-league-fixtures and I am struggling to assign the date to each match because they are in separate elements rather than under a single one.
What I have done so far is:
def get_fixtures(webpage):
table = webpage.find('div', {'class': 'fixres__body'})
dates = table.find_all('h4', {'class': 'fixres__header2'})
fixtures = table.find_all('div', 'fixres__item')
dates_list = []
fixtures_list = []
for date in dates:
dates_list.append(date.text)
for fixture in fixtures:
a = fixture.find('a')
home_span = a.find('span',
{'class': 'matches__item-col matches__participant matches__participant--side1'})
home_team = home_span.find('span').text.strip()
away_span = a.find('span',
{'class': 'matches__item-col matches__participant matches__participant--side2'})
away_team = away_span.find('span').text.strip()
match = [home_team, away_team]
fixtures_list.append(match)
return dates_list, fixtures_list
This will give me a list of fixtures and a list of dates, but they are not assigned to each other. Is there a way to iterate down a webpage so that when I hit a date 'div', I can pull all of the fixtures immediately after until the next date 'div'?
Upvotes: 0
Views: 38
Reputation: 28640
I would look at it slightly different. Rather than grab the date, and assign a match, find each match then assign the date. Do this by first get all the fixtures and iterate through those. Once you have a fixture, have BeautifulSoup look backwards with .findPrevious()
to see what the date is that the fixture would be following.
def get_fixtures(webpage):
table = webpage.find('div', {'class': 'fixres__body'})
dates_list = []
fixtures_list = []
fixtures = table.find_all('div', {'class':'fixres__item'})
for fixture in fixtures:
a = fixture.find('a')
home_span = a.find('span',
{'class': 'matches__item-col matches__participant matches__participant--side1'})
home_team = home_span.find('span').text.strip()
away_span = a.find('span',
{'class': 'matches__item-col matches__participant matches__participant--side2'})
away_team = away_span.find('span').text.strip()
match = [home_team, away_team]
fixtures_list.append(match)
date = fixture.findPrevious('h4', {'class': 'fixres__header2'}).text
dates_list.append(date)
return dates_list, fixtures_list
Ouput:
['Friday 21st January', 'Saturday 22nd January', 'Saturday 22nd January', 'Saturday 22nd January', 'Saturday 22nd January', 'Saturday 22nd January', 'Sunday 23rd January', 'Sunday 23rd January', 'Sunday 23rd January', 'Sunday 23rd January', 'Tuesday 8th February', 'Tuesday 8th February', 'Tuesday 8th February', 'Wednesday 9th February', 'Wednesday 9th February', 'Wednesday 9th February', 'Wednesday 9th February', 'Thursday 10th February', 'Thursday 10th February', 'Saturday 12th February', 'Saturday 12th February', 'Saturday 12th February', 'Saturday 12th February', 'Saturday 12th February', 'Saturday 12th February', 'Sunday 13th February', 'Sunday 13th February', 'Sunday 13th February', 'Sunday 13th February', 'Saturday 19th February', 'Saturday 19th February', 'Saturday 19th February', 'Saturday 19th February', 'Saturday 19th February', 'Saturday 19th February', 'Saturday 19th February', 'Saturday 19th February', 'Sunday 20th February', 'Sunday 20th February', 'Friday 25th February', 'Saturday 26th February', 'Saturday 26th February', 'Saturday 26th February', 'Saturday 26th February', 'Saturday 26th February', 'Saturday 26th February', 'Saturday 26th February', 'Saturday 26th February', 'Sunday 27th February', 'Saturday 5th March', 'Saturday 5th March', 'Saturday 5th March', 'Saturday 5th March', 'Saturday 5th March', 'Saturday 5th March', 'Saturday 5th March', 'Saturday 5th March', 'Saturday 5th March', 'Saturday 5th March', 'Saturday 12th March', 'Saturday 12th March', 'Saturday 12th March', 'Saturday 12th March', 'Saturday 12th March', 'Saturday 12th March', 'Saturday 12th March', 'Saturday 12th March', 'Saturday 12th March', 'Saturday 12th March', 'Saturday 19th March', 'Saturday 19th March', 'Saturday 19th March', 'Saturday 19th March', 'Saturday 19th March', 'Saturday 19th March', 'Saturday 19th March', 'Saturday 19th March', 'Saturday 19th March', 'Saturday 19th March', 'Saturday 2nd April', 'Saturday 2nd April', 'Saturday 2nd April', 'Saturday 2nd April', 'Saturday 2nd April', 'Saturday 2nd April', 'Saturday 2nd April', 'Saturday 2nd April', 'Saturday 2nd April', 'Saturday 2nd April', 'Saturday 9th April', 'Saturday 9th April', 'Saturday 9th April', 'Saturday 9th April', 'Saturday 9th April', 'Saturday 9th April', 'Saturday 9th April', 'Saturday 9th April', 'Saturday 9th April', 'Saturday 9th April', 'Saturday 16th April', 'Saturday 16th April', 'Saturday 16th April', 'Saturday 16th April', 'Saturday 16th April', 'Saturday 16th April', 'Saturday 16th April', 'Saturday 16th April', 'Saturday 16th April', 'Saturday 16th April', 'Saturday 23rd April', 'Saturday 23rd April', 'Saturday 23rd April', 'Saturday 23rd April', 'Saturday 23rd April', 'Saturday 23rd April', 'Saturday 23rd April', 'Saturday 23rd April', 'Saturday 23rd April', 'Saturday 23rd April', 'Saturday 30th April', 'Saturday 30th April', 'Saturday 30th April', 'Saturday 30th April', 'Saturday 30th April', 'Saturday 30th April', 'Saturday 30th April', 'Saturday 30th April', 'Saturday 30th April', 'Saturday 30th April', 'Saturday 7th May', 'Saturday 7th May', 'Saturday 7th May', 'Saturday 7th May', 'Saturday 7th May', 'Saturday 7th May', 'Saturday 7th May', 'Saturday 7th May', 'Saturday 7th May', 'Saturday 7th May', 'Sunday 15th May', 'Sunday 15th May', 'Sunday 15th May', 'Sunday 15th May', 'Sunday 15th May', 'Sunday 15th May', 'Sunday 15th May', 'Sunday 15th May', 'Sunday 15th May', 'Sunday 15th May', 'Sunday 22nd May', 'Sunday 22nd May', 'Sunday 22nd May', 'Sunday 22nd May', 'Sunday 22nd May', 'Sunday 22nd May', 'Sunday 22nd May', 'Sunday 22nd May', 'Sunday 22nd May', 'Sunday 22nd May']
[['Watford', 'Norwich City'], ['Everton', 'Aston Villa'], ['Brentford', 'Wolverhampton Wanderers'], ['Leeds United', 'Newcastle United'], ['Manchester United', 'West Ham United'], ['Southampton', 'Manchester City'], ['Arsenal', 'Burnley'], ['Crystal Palace', 'Liverpool'], ['Leicester City', 'Brighton and Hove Albion'], ['Chelsea', 'Tottenham Hotspur'], ['Newcastle United', 'Everton'], ['West Ham United', 'Watford'], ['Burnley', 'Manchester United'], ['Manchester City', 'Brentford'], ['Norwich City', 'Crystal Palace'], ['Tottenham Hotspur', 'Southampton'], ['Aston Villa', 'Leeds United'], ['Liverpool', 'Leicester City'], ['Wolverhampton Wanderers', 'Arsenal'], ['Manchester United', 'Southampton'], ['Brentford', 'Crystal Palace'], ['Chelsea', 'Arsenal'], ['Everton', 'Leeds United'], ['Watford', 'Brighton and Hove Albion'], ['Norwich City', 'Manchester City'], ['Burnley', 'Liverpool'], ['Newcastle United', 'Aston Villa'], ['Tottenham Hotspur', 'Wolverhampton Wanderers'], ['Leicester City', 'West Ham United'], ['West Ham United', 'Newcastle United'], ['Arsenal', 'Brentford'], ['Aston Villa', 'Watford'], ['Brighton and Hove Albion', 'Burnley'], ['Crystal Palace', 'Chelsea'], ['Liverpool', 'Norwich City'], ['Southampton', 'Everton'], ['Manchester City', 'Tottenham Hotspur'], ['Leeds United', 'Manchester United'], ['Wolverhampton Wanderers', 'Leicester City'], ['Southampton', 'Norwich City'], ['Leeds United', 'Tottenham Hotspur'], ['Arsenal', 'Liverpool'], ['Brentford', 'Newcastle United'], ['Brighton and Hove Albion', 'Aston Villa'], ['Crystal Palace', 'Burnley'], ['Manchester United', 'Watford'], ['West Ham United', 'Wolverhampton Wanderers'], ['Everton', 'Manchester City'], ['Chelsea', 'Leicester City'], ['Aston Villa', 'Southampton'], ['Burnley', 'Chelsea'], ['Leicester City', 'Leeds United'], ['Liverpool', 'West Ham United'], ['Manchester City', 'Manchester United'], ['Newcastle United', 'Brighton and Hove Albion'], ['Norwich City', 'Brentford'], ['Tottenham Hotspur', 'Everton'], ['Watford', 'Arsenal'], ['Wolverhampton Wanderers', 'Crystal Palace'], ['Arsenal', 'Leicester City'], ['Brentford', 'Burnley'], ['Brighton and Hove Albion', 'Liverpool'], ['Chelsea', 'Newcastle United'], ['Crystal Palace', 'Manchester City'], ['Everton', 'Wolverhampton Wanderers'], ['Leeds United', 'Norwich City'], ['Manchester United', 'Tottenham Hotspur'], ['Southampton', 'Watford'], ['West Ham United', 'Aston Villa'], ['Aston Villa', 'Arsenal'], ['Burnley', 'Southampton'], ['Leicester City', 'Brentford'], ['Liverpool', 'Manchester United'], ['Manchester City', 'Brighton and Hove Albion'], ['Newcastle United', 'Crystal Palace'], ['Norwich City', 'Chelsea'], ['Tottenham Hotspur', 'West Ham United'], ['Watford', 'Everton'], ['Wolverhampton Wanderers', 'Leeds United'], ['Brighton and Hove Albion', 'Norwich City'], ['Burnley', 'Manchester City'], ['Chelsea', 'Brentford'], ['Crystal Palace', 'Arsenal'], ['Leeds United', 'Southampton'], ['Liverpool', 'Watford'], ['Manchester United', 'Leicester City'], ['Tottenham Hotspur', 'Newcastle United'], ['West Ham United', 'Everton'], ['Wolverhampton Wanderers', 'Aston Villa'], ['Arsenal', 'Brighton and Hove Albion'], ['Aston Villa', 'Tottenham Hotspur'], ['Brentford', 'West Ham United'], ['Everton', 'Manchester United'], ['Leicester City', 'Crystal Palace'], ['Manchester City', 'Liverpool'], ['Newcastle United', 'Wolverhampton Wanderers'], ['Norwich City', 'Burnley'], ['Southampton', 'Chelsea'], ['Watford', 'Leeds United'], ['Aston Villa', 'Liverpool'], ['Everton', 'Crystal Palace'], ['Leeds United', 'Chelsea'], ['Manchester United', 'Norwich City'], ['Newcastle United', 'Leicester City'], ['Southampton', 'Arsenal'], ['Tottenham Hotspur', 'Brighton and Hove Albion'], ['Watford', 'Brentford'], ['West Ham United', 'Burnley'], ['Wolverhampton Wanderers', 'Manchester City'], ['Arsenal', 'Manchester United'], ['Brentford', 'Tottenham Hotspur'], ['Brighton and Hove Albion', 'Southampton'], ['Burnley', 'Wolverhampton Wanderers'], ['Chelsea', 'West Ham United'], ['Crystal Palace', 'Leeds United'], ['Leicester City', 'Aston Villa'], ['Liverpool', 'Everton'], ['Manchester City', 'Watford'], ['Norwich City', 'Newcastle United'], ['Aston Villa', 'Norwich City'], ['Everton', 'Chelsea'], ['Leeds United', 'Manchester City'], ['Manchester United', 'Brentford'], ['Newcastle United', 'Liverpool'], ['Southampton', 'Crystal Palace'], ['Tottenham Hotspur', 'Leicester City'], ['Watford', 'Burnley'], ['West Ham United', 'Arsenal'], ['Wolverhampton Wanderers', 'Brighton and Hove Albion'], ['Arsenal', 'Leeds United'], ['Brentford', 'Southampton'], ['Brighton and Hove Albion', 'Manchester United'], ['Burnley', 'Aston Villa'], ['Chelsea', 'Wolverhampton Wanderers'], ['Crystal Palace', 'Watford'], ['Leicester City', 'Everton'], ['Liverpool', 'Tottenham Hotspur'], ['Manchester City', 'Newcastle United'], ['Norwich City', 'West Ham United'], ['Aston Villa', 'Crystal Palace'], ['Everton', 'Brentford'], ['Leeds United', 'Brighton and Hove Albion'], ['Manchester United', 'Chelsea'], ['Newcastle United', 'Arsenal'], ['Southampton', 'Liverpool'], ['Tottenham Hotspur', 'Burnley'], ['Watford', 'Leicester City'], ['West Ham United', 'Manchester City'], ['Wolverhampton Wanderers', 'Norwich City'], ['Arsenal', 'Everton'], ['Brentford', 'Leeds United'], ['Brighton and Hove Albion', 'West Ham United'], ['Burnley', 'Newcastle United'], ['Chelsea', 'Watford'], ['Crystal Palace', 'Manchester United'], ['Leicester City', 'Southampton'], ['Liverpool', 'Wolverhampton Wanderers'], ['Manchester City', 'Aston Villa'], ['Norwich City', 'Tottenham Hotspur']]
Also just a side note, I would probably construct this differently. You end up with lots of duplicate values (Ie the date). And you have to be careful that all your lists are of same length. In this case, with only 2 lists, not too difficult to manage, but can be a headache at times.
I'd consider making a json type of format like below, and you can include other values with a key:value relationship:
def get_fixtures(webpage):
table = webpage.find('div', {'class': 'fixres__body'})
data = {}
fixtures = table.find_all('div', {'class':'fixres__item'})
for fixture in fixtures:
a = fixture.find('a')
home_span = a.find('span',
{'class': 'matches__item-col matches__participant matches__participant--side1'})
home_team = home_span.find('span').text.strip()
home_score = a.find_all('span', {'class':'matches__teamscores-side'})[0].text.strip()
away_span = a.find('span',
{'class': 'matches__item-col matches__participant matches__participant--side2'})
away_team = away_span.find('span').text.strip()
away_score = a.find_all('span', {'class':'matches__teamscores-side'})[-1].text.strip()
game_time = a.find('span', {'class':'matches__date'}).text.strip()
game_status = a['data-status']
match = {'homeTeam':home_team,
'homeScore':home_score,
'awayTeam':away_team,
'awayScore':away_score,
'gameTime':game_time,
'gameStatus':game_status}
date = fixture.findPrevious('h4', {'class': 'fixres__header2'}).text
if date not in data.keys():
data[date] = []
data[date].append(match)
return data
Now you have a dictionary where on a given date, you have a list of the matches with the identified home and away team, start time, score (if it's live/current). Particularly, it'll also pick up the postpned flag for a game too.
Upvotes: 1