Aymen Eddaoudi
Aymen Eddaoudi

Reputation: 35

Match Data in Excel using python

I am making a betting software that scrapes the fixtures from a website, puts them into an Excel file and then compares it with another Excel file that I have already manually created to find the winner (the source) The source file would look like this and The scraped output would look like this

What I want to do is for the software to find matching fixtures between the two files, then figure out the result using the "Result" column in the source file.

Problem is I don't know how to do it, and I have no idea how to look for it, I'm such a beginner and I picked this project for school, can someone at least give me a name to what I'm trying to do (if it has a name), an idea, a path to follow, etc...

Thanks a lot to everyone !

Here is my code for now (Only scraping and making an Excel file)

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd

#Formation de l'url à scraper
url= "https://www.bbc.com/sport/football/scores-fixtures/"
date = input(str("Input the date, hit Enter for today"))
src = url + date

#Scraping de la page web, balise abbr avec attribut title
html = requests.get(src).text
soup = bs(html, 'lxml')

select = soup.find_all('abbr')

fixtures = []
for abbr in select:
    if abbr.has_attr('title'):
        output = str(abbr['title'])
        fixtures.append(output)

#Création tableau excel, liste[position :: itération]

table = pd.DataFrame()
table['Home'] = pd.Series(fixtures[::2])
table['Away'] = pd.Series(fixtures[1::2])
name = 'fixtures' + '-' + date + '.xlsx'
table.to_excel(name, index=False)

=====EDIT====== I think it's gonna be easier if I explain the purpose of my Excel manual Source. It is a file that contains some very renowned fixtures and their outcome (the result), it's what we call in French "bête noire", basically a team that never wins against another. And now what I'm trying to do is for the program to look up the games of the day (or any certain date), match it against that Excel file to see if there's any "bête noire", if so, it should return the fixture with the winner (result/outcome).

EXAMPLE :

The scraped data  : 
Home Away
Liverpool Manchester U
Burnley West Ham
Arsenal Chelsea
Tottenham Brentford
 The Manual Excel source file that I have :
Home Away Result
Liverpool Manchester U Liverpool
TestTeam TestTeam TestTeam
TestTeam TestTeam TestTeam
TestTeam TestTeam TestTeam
Arsenal Chelsea Chelsea
TestTeam TestTeam TestTeam
TestTeam TestTeam TestTeam
TestTeam TestTeam TestTeam

Now the program should be able to match the data that it scraped against my Excel file, find the matching fixtures (Liverpool//Manchester U ; Arsenal//Chelsea), then return the result of the fixture.

I hope this has cleared any confusion, and thanks to everyone for their help I really appreciate it !

Upvotes: 1

Views: 984

Answers (2)

chitown88
chitown88

Reputation: 28565

The data you are needing/asking for here is all within the api. You don't need to use Selnium then. Just get all the events data and you can manipulate the dataframe as you need.

This will get all the data for you:

import requests
import pandas as pd
import datetime

today = datetime.datetime.now().strftime("%Y-%m-%d")
date_str = input("Input the date in the form of yyyy-mm-dd, hit Enter for today\n-> ")
if not date_str:
    date = today
else:
    date = datetime.datetime.strptime(date_str, "%Y-%m-%d").strftime("%Y-%m-%d")


url = 'https://push.api.bbci.co.uk/batch'
query = {
't':f'/data/bbc-morph-football-scores-match-list-data/endDate/{date}/startDate/{date}/todayDate/{today}/tournament/full-priority-order/version/2.4.6?timeout=5'}

jsonData = requests.get(url, params=query).json()

rows = []
matchData = jsonData['payload'][0]['body']['matchData']
for match in matchData:
    tournamentDatesWithEvents = match['tournamentDatesWithEvents']
    for k, v in tournamentDatesWithEvents.items():
        for i in v:
            events = i['events']
            for event in events:
                rows.append(event)
                
event_df = pd.json_normalize(rows)

Output: sample of just 1st 5 rows

Input the date in the form of yyyy-mm-dd, hit Enter for today
-> 2021-11-10

print(event_df.head(5).to_string())
      eventKey                  startTime  isTBC  minutesElapsed  minutesIntoAddedTime eventStatus eventStatusNote eventStatusReason eventOutcomeType eventType seriesWinner     cpsId cpsLive officials tournamentInfo eventActions startTimeInUKHHMM comment                      href            tournamentSlug homeTeam.key  homeTeam.scores.score  homeTeam.scores.halfTime  homeTeam.scores.fullTime homeTeam.scores.extraTime homeTeam.scores.shootout homeTeam.scores.aggregate homeTeam.scores.aggregateGoalsAway homeTeam.formation homeTeam.eventOutcome homeTeam.name.first  homeTeam.name.full homeTeam.name.abbreviation homeTeam.name.last awayTeam.key  awayTeam.scores.score  awayTeam.scores.halfTime  awayTeam.scores.fullTime awayTeam.scores.extraTime awayTeam.scores.shootout awayTeam.scores.aggregate awayTeam.scores.aggregateGoalsAway awayTeam.formation awayTeam.eventOutcome         awayTeam.name.first          awayTeam.name.full  awayTeam.name.abbreviation awayTeam.name.last eventProgress.period eventProgress.status venue.name.abbreviation    venue.name.videCode       venue.name.first        venue.name.full venue.homeCountry      tournamentName.first            tournamentName.full tournamentName.abbreviation
0  EFBO2257629  2021-11-10T17:00:00+00:00  False             NaN                   NaN  post-event          Result              None          regular   Regular         None      None    None        []           None         None             17:00    None                      None  international-friendlies    TFBB11658                      0                         0                         0                      None                     None                      None                               None               None                  loss              Kosovo              Kosovo                     Kosovo               None     TFBB1843                      2                         0                         0                      None                     None                      None                               None               None                   win                      Jordan                      Jordan                      Jordan               None             FULLTIME               RESULT   Stadiumi Fadil Vokrri  Stadiumi Fadil Vokrri  Stadiumi Fadil Vokrri  Stadiumi Fadil Vokrri              None  International Friendlies       International Friendlies              Int Friendlies
1  EFBO2245946  2021-11-10T17:45:00+00:00  False            90.0                   3.0  post-event          Result              None          regular   Regular         None      None    None        []           None         None             17:45    None                      None   womens-champions-league     TFBB9121                      4                         3                         4                      None                     None                      None                               None               None                   win  Barcelona Femenino  Barcelona Femenino         Barcelona Femenino               None     TFBB8706                      0                         0                         0                      None                     None                      None                               None               None                  loss  TSG 1899 Hoffenheim Ladies  TSG 1899 Hoffenheim Ladies  TSG 1899 Hoffenheim Ladies               None             FULLTIME               RESULT     Estadi Johan Cruyff    Estadi Johan Cruyff    Estadi Johan Cruyff    Estadi Johan Cruyff             Spain  Women's Champions League  UEFA Women's Champions League          Women's Champs Lge
2  EFBO2245947  2021-11-10T17:45:00+00:00  False            90.0                   4.0  post-event          Result              None          regular   Regular         None  59214028     yes        []           None         None             17:45    None  /sport/football/59214028   womens-champions-league    TFBB16926                      1                         0                         1                      None                     None                      None                               None               None                  loss       HB Køge Women       HB Køge Women              HB Køge Women               None     TFBB6272                      5                         1                         5                      None                     None                      None                               None               None                   win               Arsenal Women               Arsenal Women               Arsenal Women               None             FULLTIME               RESULT   Capelli Sport Stadion  Capelli Sport Stadion  Capelli Sport Stadion  Capelli Sport Stadion           Denmark  Women's Champions League  UEFA Women's Champions League          Women's Champs Lge
3  EFBO2245948  2021-11-10T20:00:00+00:00  False            90.0                   6.0  post-event          Result              None          regular   Regular         None      None    None        []           None         None             20:00    None                      None   womens-champions-league    TFBB16019                      0                         0                         0                      None                     None                      None                               None               None                  loss       Benfica Women       Benfica Women              Benfica Women               None    TFBB17702                      1                         0                         1                      None                     None                      None                               None               None                   win             BK Häcken Women             BK Häcken Women             BK Häcken Women               None             FULLTIME               RESULT    Caixa Futebol Campus   Caixa Futebol Campus   Caixa Futebol Campus   Caixa Futebol Campus          Portugal  Women's Champions League  UEFA Women's Champions League          Women's Champs Lge
4  EFBO2245949  2021-11-10T20:00:00+00:00  False            90.0                   4.0  post-event          Result              None          regular   Regular         None      None    None        []           None         None             20:00    None                      None   womens-champions-league     TFBB9100                      2                         0                         2                      None                     None                      None                               None               None                   win      Lyon Féminines      Lyon Féminines             Lyon Féminines               None     TFBB6597                      1                         1                         1                      None                     None                      None                               None               None                  loss    FC Bayern München Ladies    FC Bayern München Ladies    FC Bayern München Ladies               None             FULLTIME               RESULT        Groupama Stadium       Groupama Stadium       Groupama Stadium       Groupama Stadium            France  Women's Champions League  UEFA Women's Champions League          Women's Champs Lge

Now can just pull out the columns you want:

results = event_df[['homeTeam.name.full','homeTeam.eventOutcome','awayTeam.name.full','awayTeam.eventOutcome']]   

Output:

print(results.to_string())
      homeTeam.name.full homeTeam.eventOutcome          awayTeam.name.full awayTeam.eventOutcome
0                 Kosovo                  loss                      Jordan                   win
1     Barcelona Femenino                   win  TSG 1899 Hoffenheim Ladies                  loss
2          HB Køge Women                  loss               Arsenal Women                   win
3          Benfica Women                  loss             BK Häcken Women                   win
4         Lyon Féminines                   win    FC Bayern München Ladies                  loss
5                Walsall                  loss         Forest Green Rovers                   win
6            Exeter City                   win              Bristol Rovers                  loss
7         Wigan Athletic                   win             Shrewsbury Town                  loss
8              Broomhill                  loss              Bo'ness United                   win
9      Cumbernauld Colts                  loss                   Rangers B                   win
10       Dalbeattie Star                  loss               East Kilbride                   win
11    East Stirlingshire                   tie      University of Stirling                   tie
12                Grêmio                   win                  Fluminense                  loss
13  Athletico Paranaense                   win                       Ceará                  loss
14      Atlético Mineiro                   win                 Corinthians                  loss
15                Santos                   win         Red Bull Bragantino                  loss
16             Palmeiras                   win         Atlético Goianiense                  loss

EDIT: ADDITIONAL CODE

import pandas as pd

data = [['Liverpool','Manchester U'],
        ['Burnley'  ,'West Ham'],
        ['Arsenal'  ,'Chelsea'],
        ['Tottenham','Brentford']]

scraped_data = pd.DataFrame(data, columns = ['Home',    'Away'])



data = [['Liverpool',   'Manchester U', 'Liverpool'],
    ['TestTeamA',   'TestTeamB',    'TestTeamA'],
    ['Arsenal', 'Chelsea',  'Chelsea']]

excel_data = pd.DataFrame(data, columns = ['Home',  'Away', 'Result'])


result = scraped_data.merge(excel_data, how='left', on=['Home','Away']).dropna(subset=['Result'])

Output:

print(result)
        Home          Away     Result
0  Liverpool  Manchester U  Liverpool
2    Arsenal       Chelsea    Chelsea

Upvotes: 1

Wilian
Wilian

Reputation: 1257

The table of this url is not so easy to manipulate, I suggest using selenium:

import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service

options = webdriver.ChromeOptions() 
options.add_argument('--no-default-browser-check')
options.add_argument('--log-level=3')
options.add_argument('--headless')
service = Service('chromedriver.exe')
driver = webdriver.Chrome(options=options, service=service)
date = input(str("Input the date, hit Enter for today: "))
driver.get("https://www.bbc.com/sport/football/scores-fixtures/" + date)
games = driver.find_elements(By.XPATH, "//article[@data-event-id]")
ListHome = []
ListAway = []
ListTime = []
for game in games:
        ListHome.append(game.find_element(By.XPATH,".//span[contains(@class,'team--time-home')]").text)
        ListAway.append(game.find_element(By.XPATH,".//span[contains(@class,'team--time-away')]").text)
df = pd.DataFrame(list(zip(ListHome, ListAway)), columns =['Home', 'Away'])
df.to_excel('name.xlsx', index=False)

Use pd.merge to compare with your other spreadsheet.

Upvotes: 0

Related Questions