Brad Langtry
Brad Langtry

Reputation: 31

scrape data from a date onwards

I want to scrape data from a table only after a certain date. Below code grabs the first date in data (url attached), but how would I create say a for loop to only extract data from say 11-Oct-2020 and all lines before this?

I want to create a for loop to extract all data before a certain date in this table 'table table-hover small horsePerformance')

http://www.harness.org.au/racing/horse-search/?horseId=813476


with requests.Session() as s:
   try:
       webpage_response = s.get(horseurl, headers=headers)
   except requests.exceptions.ConnectionError:
        r.status_code = "Connection refused"
                            
   soup = bs(webpage_response.content, "html.parser")
   horseresult6 = soup.find('table', class_='table table-hover small horsePerformance')
   daysbetween = horseresult6.find('td', class_='date').get_text().strip()
   daysbetween24 = horseresult6.find('td', class_='date').find_next('td', class_='date').get_text().strip()

However I think it should look like

for tr in horseresult6.find_all('tr')[1:]: 
     daysbetween = tr.find('td', class_='date').get_text().strip()
     if xdate > daysbetween:
         do something
     else:
         continue

when i try this it doesnt seem to work

Upvotes: 1

Views: 807

Answers (1)

baduker
baduker

Reputation: 20042

You can compare dates with the < and > operators.

Here's how:

import time

import requests
from bs4 import BeautifulSoup

horse_url = "http://www.harness.org.au/racing/horse-search/?horseId=813476"

with requests.Session() as s:
    try:
        webpage_response = s.get(horse_url)
    except requests.exceptions.ConnectionError:
        webpage_response.status_code = "Connection refused"

    table = BeautifulSoup(
        webpage_response.content,
        "html.parser",
    ).find('table', class_='table table-hover small horsePerformance')

    target_date = "11 Oct 2020"

    for row in table.find_all("tr")[1:]:  # skipping the header
        date = row.find("td", class_="date").find("a").getText()  # table date
        if time.strptime(date, "%d %b %Y") >= time.strptime(target_date, "%d %b %Y"):  # comparing the dates
            # do your parsing here, this is just an example
            print(f'{date} - {row.find("td", class_="stake").getText(strip=True)}')

Output:

05 Apr 2021 - $4,484
29 Mar 2021 - $595
23 Mar 2021 - $4,484
12 Mar 2021 - $220
08 Mar 2021 - $181
02 Mar 2021 - $263
19 Feb 2021 - $180
12 Feb 2021 - $1,200
26 Jan 2021 - $4,484

Going back in time:

target_date = "26 Jan 2021"

    for row in table.find_all("tr")[1:]:  # skipping the header
        date = row.find("td", class_="date").find("a").getText()  # table date
        if time.strptime(date, "%d %b %Y") <= time.strptime(target_date, "%d %b %Y"):  # comparing the dates
            # do your parsing here, this is just an example
            print(f'{date} - {row.find("td", class_="stake").getText(strip=True)}')

Output:

26 Jan 2021 - $4,484
14 Sep 2020 - $100
11 Sep 2020 - $616
04 Sep 2020 - $180
21 Aug 2020 - $180
17 Aug 2020 - $595
28 Jul 2020 - $4,291
21 Jul 2020 - $3,523
13 Jul 2020 - $300
30 Jun 2020 - $1,173
15 Jun 2020 - $100
30 May 2020 - $3,523
22 May 2020 - $500
12 May 2020 - $963
05 May 2020 - $3,523
02 May 2020 - $1,986
24 Apr 2020 - $144
09 Apr 2020 - $144
30 Mar 2020 - $1,225
10 Mar 2020 - $100
09 Dec 2019 - $595
02 Dec 2019 - $4,484
26 Nov 2019 - $4,484
19 Nov 2019 - $100
02 Nov 2019 - $4,484
27 Oct 2019 - $2,562
13 Oct 2019 - $700
31 May 2019 - $1,000
21 May 2019 - $4,484
07 May 2019 - $1,225
27 Apr 2019 - $595
21 Apr 2019 - $0
14 Apr 2019 - $0
07 Apr 2019 - $0

Upvotes: 1

Related Questions