Reputation: 31
I want to scrape data from a table only after a certain date. Below code grabs the first date in data (url attached), but how would I create say a for loop to only extract data from say 11-Oct-2020 and all lines before this?
I want to create a for loop to extract all data before a certain date in this table 'table table-hover small horsePerformance')
http://www.harness.org.au/racing/horse-search/?horseId=813476
with requests.Session() as s:
try:
webpage_response = s.get(horseurl, headers=headers)
except requests.exceptions.ConnectionError:
r.status_code = "Connection refused"
soup = bs(webpage_response.content, "html.parser")
horseresult6 = soup.find('table', class_='table table-hover small horsePerformance')
daysbetween = horseresult6.find('td', class_='date').get_text().strip()
daysbetween24 = horseresult6.find('td', class_='date').find_next('td', class_='date').get_text().strip()
However I think it should look like
for tr in horseresult6.find_all('tr')[1:]:
daysbetween = tr.find('td', class_='date').get_text().strip()
if xdate > daysbetween:
do something
else:
continue
when i try this it doesnt seem to work
Upvotes: 1
Views: 807
Reputation: 20042
You can compare dates with the <
and >
operators.
Here's how:
import time
import requests
from bs4 import BeautifulSoup
horse_url = "http://www.harness.org.au/racing/horse-search/?horseId=813476"
with requests.Session() as s:
try:
webpage_response = s.get(horse_url)
except requests.exceptions.ConnectionError:
webpage_response.status_code = "Connection refused"
table = BeautifulSoup(
webpage_response.content,
"html.parser",
).find('table', class_='table table-hover small horsePerformance')
target_date = "11 Oct 2020"
for row in table.find_all("tr")[1:]: # skipping the header
date = row.find("td", class_="date").find("a").getText() # table date
if time.strptime(date, "%d %b %Y") >= time.strptime(target_date, "%d %b %Y"): # comparing the dates
# do your parsing here, this is just an example
print(f'{date} - {row.find("td", class_="stake").getText(strip=True)}')
Output:
05 Apr 2021 - $4,484
29 Mar 2021 - $595
23 Mar 2021 - $4,484
12 Mar 2021 - $220
08 Mar 2021 - $181
02 Mar 2021 - $263
19 Feb 2021 - $180
12 Feb 2021 - $1,200
26 Jan 2021 - $4,484
Going back in time:
target_date = "26 Jan 2021"
for row in table.find_all("tr")[1:]: # skipping the header
date = row.find("td", class_="date").find("a").getText() # table date
if time.strptime(date, "%d %b %Y") <= time.strptime(target_date, "%d %b %Y"): # comparing the dates
# do your parsing here, this is just an example
print(f'{date} - {row.find("td", class_="stake").getText(strip=True)}')
Output:
26 Jan 2021 - $4,484
14 Sep 2020 - $100
11 Sep 2020 - $616
04 Sep 2020 - $180
21 Aug 2020 - $180
17 Aug 2020 - $595
28 Jul 2020 - $4,291
21 Jul 2020 - $3,523
13 Jul 2020 - $300
30 Jun 2020 - $1,173
15 Jun 2020 - $100
30 May 2020 - $3,523
22 May 2020 - $500
12 May 2020 - $963
05 May 2020 - $3,523
02 May 2020 - $1,986
24 Apr 2020 - $144
09 Apr 2020 - $144
30 Mar 2020 - $1,225
10 Mar 2020 - $100
09 Dec 2019 - $595
02 Dec 2019 - $4,484
26 Nov 2019 - $4,484
19 Nov 2019 - $100
02 Nov 2019 - $4,484
27 Oct 2019 - $2,562
13 Oct 2019 - $700
31 May 2019 - $1,000
21 May 2019 - $4,484
07 May 2019 - $1,225
27 Apr 2019 - $595
21 Apr 2019 - $0
14 Apr 2019 - $0
07 Apr 2019 - $0
Upvotes: 1